Loading ...
Sorry, an error occurred while loading the content.
 

Normalizing vote type fields

Expand Messages
  • Josh Tauberer
    Hey, all. I am thinking of normalizing the type field in roll call vote files. Most won t change but I ll correct typos in the upstream data sources and I
    Message 1 of 4 , Feb 16, 2011
      Hey, all.

      I am thinking of normalizing the "type" field in roll call vote files.
      Most won't change but I'll correct typos in the upstream data sources
      and I may revise some (like "Call of the House" to "Quorum Call" where
      appropriate). I'll also add a category attribute to the element to group
      similar vote types together (on passage, suspend the rules and pass, on
      agreeing to the resolution, on the amendment, etc. all labeled e.g.
      "passage").

      This is fair warning that I'll be changing that field on existing data
      real soon now. If that'll cause any problems for you (it really
      shouldn't), speak now!

      --
      - Josh Tauberer
      - CivicImpulse / GovTrack.us

      http://razor.occams.info | www.govtrack.us | civicimpulse.com

      "Members of both sides are reminded not to use guests of the
      House as props."
    • Eric Mill
      I ve actually been doing some normalization using that field as well, for our Real Time Congress API. It only works for a subset of them, of course, but my
      Message 2 of 4 , Feb 16, 2011
        I've actually been doing some normalization using that field as well,
        for our Real Time Congress API. It only works for a subset of them, of
        course, but my regexes are pretty straightforward:
        https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69

        You might want to consider retaining the original textual vote type,
        and adding a new field with the standard vote type, so that a) it's
        backwards compatible, and b) it lets people see the original text and
        come up with their own normalization strategy if they want.

        In RTC, I leave the original text as the "roll_type" field, and then
        the normalized field as "vote_type". The feed of votes contains voice
        votes, as pulled from GovTrack's <vote> tags on bills, whose
        "vote_type" is always set to "passage", and which do not have a
        "roll_type".

        -- Eric

        On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer <tauberer@...> wrote:
        > Hey, all.
        >
        > I am thinking of normalizing the "type" field in roll call vote files.
        > Most won't change but I'll correct typos in the upstream data sources
        > and I may revise some (like "Call of the House" to "Quorum Call" where
        > appropriate). I'll also add a category attribute to the element to group
        > similar vote types together (on passage, suspend the rules and pass, on
        > agreeing to the resolution, on the amendment, etc. all labeled e.g.
        > "passage").
        >
        > This is fair warning that I'll be changing that field on existing data
        > real soon now. If that'll cause any problems for you (it really
        > shouldn't), speak now!
        >
        > --
        > - Josh Tauberer
        > - CivicImpulse / GovTrack.us
        >
        > http://razor.occams.info | www.govtrack.us | civicimpulse.com
        >
        > "Members of both sides are reminded not to use guests of the
        > House as props."
        >
        >
        >
        > ------------------------------------
        >
        > Yahoo! Groups Links
        >
        >
        >
        >
      • Josh Tauberer
        The type is also repeated in the element. For whatever reason, the scrapers make the be TYPE + : + BILL/AMENDMENT/OTHER in the House
        Message 3 of 4 , Feb 16, 2011
          The type is also repeated in the <question> element.

          For whatever reason, the scrapers make the <question> be TYPE + " :" +
          BILL/AMENDMENT/OTHER in the House or TYPE + " (" + MATTER + ")" in the
          Senate. At some point the arbitrary colon-versus-parens difference
          should be removed as well.

          Since the <type> is also present in the <question>, I think it makes
          sense to normalize one and leave the other. What I'm intending right now is:

          Normalize the <type> element by:

          1) Removing differences in capitalization and phrasing which would be
          irrelevant to anyone but a parliamentarian. So "On the Amendment" and
          "On agreeing to the amendment" both become "On the Amendment".

          2) Removing part specifications in divided questions and the like. "On
          Agreeing to Article X of the Concurrent Resolution" becomes "On Agreeing
          to the Concurrent Resolution (Part)" and "On Adoption of the fifth
          portion of the divided question" becomes "On Part of the Divided Question".

          Second, I will add a "category" attribute that collapses similar types.
          Right now I have: passage, passage-suspension, passage-part, amendment,
          nomination, procedural, cloture, and other. "passage-part" is for "On
          Agreeing to Article X of the Concurrent Resolution". "other" is so far
          only for divided questions.

          With these categories, when the type is passage, passage-suspension,
          amendment, and cloture, there's nothing informative in the <question>
          element that's not indicated by this category (except for the bill or
          amendment number, which is also indicated in the bill/amendment
          elements), so you can get away with displaying a fixed, user-friendly
          string for each of these categories rather than the original type.

          But it's still open for debate.

          - Josh Tauberer
          - CivicImpulse / GovTrack.us

          http://razor.occams.info | www.govtrack.us | civicimpulse.com

          "Members of both sides are reminded not to use guests of the
          House as props."

          On 02/16/2011 10:29 AM, Eric Mill wrote:
          > I've actually been doing some normalization using that field as well,
          > for our Real Time Congress API. It only works for a subset of them, of
          > course, but my regexes are pretty straightforward:
          > https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69
          >
          > You might want to consider retaining the original textual vote type,
          > and adding a new field with the standard vote type, so that a) it's
          > backwards compatible, and b) it lets people see the original text and
          > come up with their own normalization strategy if they want.
          >
          > In RTC, I leave the original text as the "roll_type" field, and then
          > the normalized field as "vote_type". The feed of votes contains voice
          > votes, as pulled from GovTrack's<vote> tags on bills, whose
          > "vote_type" is always set to "passage", and which do not have a
          > "roll_type".
          >
          > -- Eric
          >
          > On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer<tauberer@...> wrote:
          >> Hey, all.
          >>
          >> I am thinking of normalizing the "type" field in roll call vote files.
          >> Most won't change but I'll correct typos in the upstream data sources
          >> and I may revise some (like "Call of the House" to "Quorum Call" where
          >> appropriate). I'll also add a category attribute to the element to group
          >> similar vote types together (on passage, suspend the rules and pass, on
          >> agreeing to the resolution, on the amendment, etc. all labeled e.g.
          >> "passage").
          >>
          >> This is fair warning that I'll be changing that field on existing data
          >> real soon now. If that'll cause any problems for you (it really
          >> shouldn't), speak now!
          >>
          >> --
          >> - Josh Tauberer
          >> - CivicImpulse / GovTrack.us
          >>
          >> http://razor.occams.info | www.govtrack.us | civicimpulse.com
          >>
          >> "Members of both sides are reminded not to use guests of the
          >> House as props."
          >>
          >>
          >>
          >> ------------------------------------
          >>
          >> Yahoo! Groups Links
          >>
          >>
          >>
          >>
        • Eric Mill
          That sounds like a good approach. When I was doing this myself, I also contemplated normalizing it into two fields: vote_type, and vote_on. So a vote on an
          Message 4 of 4 , Feb 17, 2011
            That sounds like a good approach. When I was doing this myself, I also
            contemplated normalizing it into two fields: vote_type, and vote_on.
            So a vote on an amendment would have a vote_type of "passage" and a
            vote_on of "amendment". And a vote to start debate on a Supreme Court
            nomination would be a vote_type of "cloture" and a vote_on of
            "nomination".

            I gave up when I realized I didn't have enough information available
            to me in GovTrack's data (or in House/Senate feeds) to establish that
            a vote was a cloture vote on a nomination. Maybe you have more
            information, or a better handle on the problem though?

            -- Eric

            On Wed, Feb 16, 2011 at 6:48 PM, Josh Tauberer <tauberer@...> wrote:
            > The type is also repeated in the <question> element.
            >
            > For whatever reason, the scrapers make the <question> be TYPE + " :" +
            > BILL/AMENDMENT/OTHER in the House or TYPE + " (" + MATTER + ")" in the
            > Senate. At some point the arbitrary colon-versus-parens difference should be
            > removed as well.
            >
            > Since the <type> is also present in the <question>, I think it makes sense
            > to normalize one and leave the other. What I'm intending right now is:
            >
            > Normalize the <type> element by:
            >
            > 1) Removing differences in capitalization and phrasing which would be
            > irrelevant to anyone but a parliamentarian. So "On the Amendment" and "On
            > agreeing to the amendment" both become "On the Amendment".
            >
            > 2) Removing part specifications in divided questions and the like. "On
            > Agreeing to Article X of the Concurrent Resolution" becomes "On Agreeing to
            > the Concurrent Resolution (Part)" and "On Adoption of the fifth portion of
            > the divided question" becomes "On Part of the Divided Question".
            >
            > Second, I will add a "category" attribute that collapses similar types.
            > Right now I have: passage, passage-suspension, passage-part, amendment,
            > nomination, procedural, cloture, and other. "passage-part" is for "On
            > Agreeing to Article X of the Concurrent Resolution". "other" is so far only
            > for divided questions.
            >
            > With these categories, when the type is passage, passage-suspension,
            > amendment, and cloture, there's nothing informative in the <question>
            > element that's not indicated by this category (except for the bill or
            > amendment number, which is also indicated in the bill/amendment elements),
            > so you can get away with displaying a fixed, user-friendly string for each
            > of these categories rather than the original type.
            >
            > But it's still open for debate.
            >
            > - Josh Tauberer
            > - CivicImpulse / GovTrack.us
            >
            > http://razor.occams.info | www.govtrack.us | civicimpulse.com
            >
            > "Members of both sides are reminded not to use guests of the
            > House as props."
            >
            > On 02/16/2011 10:29 AM, Eric Mill wrote:
            >>
            >> I've actually been doing some normalization using that field as well,
            >> for our Real Time Congress API. It only works for a subset of them, of
            >> course, but my regexes are pretty straightforward:
            >>
            >> https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69
            >>
            >> You might want to consider retaining the original textual vote type,
            >> and adding a new field with the standard vote type, so that a) it's
            >> backwards compatible, and b) it lets people see the original text and
            >> come up with their own normalization strategy if they want.
            >>
            >> In RTC, I leave the original text as the "roll_type" field, and then
            >> the normalized field as "vote_type". The feed of votes contains voice
            >> votes, as pulled from GovTrack's<vote>  tags on bills, whose
            >> "vote_type" is always set to "passage", and which do not have a
            >> "roll_type".
            >>
            >> -- Eric
            >>
            >> On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer<tauberer@...>
            >>  wrote:
            >>>
            >>> Hey, all.
            >>>
            >>> I am thinking of normalizing the "type" field in roll call vote files.
            >>> Most won't change but I'll correct typos in the upstream data sources
            >>> and I may revise some (like "Call of the House" to "Quorum Call" where
            >>> appropriate). I'll also add a category attribute to the element to group
            >>> similar vote types together (on passage, suspend the rules and pass, on
            >>> agreeing to the resolution, on the amendment, etc. all labeled e.g.
            >>> "passage").
            >>>
            >>> This is fair warning that I'll be changing that field on existing data
            >>> real soon now. If that'll cause any problems for you (it really
            >>> shouldn't), speak now!
            >>>
            >>> --
            >>> - Josh Tauberer
            >>> - CivicImpulse / GovTrack.us
            >>>
            >>> http://razor.occams.info | www.govtrack.us | civicimpulse.com
            >>>
            >>> "Members of both sides are reminded not to use guests of the
            >>> House as props."
            >>>
            >>>
            >>>
            >>> ------------------------------------
            >>>
            >>> Yahoo! Groups Links
            >>>
            >>>
            >>>
            >>>
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.