Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] Normalizing vote type fields

Expand Messages
  • Josh Tauberer
    The type is also repeated in the element. For whatever reason, the scrapers make the be TYPE + : + BILL/AMENDMENT/OTHER in the House
    Message 1 of 4 , Feb 16, 2011
    • 0 Attachment
      The type is also repeated in the <question> element.

      For whatever reason, the scrapers make the <question> be TYPE + " :" +
      BILL/AMENDMENT/OTHER in the House or TYPE + " (" + MATTER + ")" in the
      Senate. At some point the arbitrary colon-versus-parens difference
      should be removed as well.

      Since the <type> is also present in the <question>, I think it makes
      sense to normalize one and leave the other. What I'm intending right now is:

      Normalize the <type> element by:

      1) Removing differences in capitalization and phrasing which would be
      irrelevant to anyone but a parliamentarian. So "On the Amendment" and
      "On agreeing to the amendment" both become "On the Amendment".

      2) Removing part specifications in divided questions and the like. "On
      Agreeing to Article X of the Concurrent Resolution" becomes "On Agreeing
      to the Concurrent Resolution (Part)" and "On Adoption of the fifth
      portion of the divided question" becomes "On Part of the Divided Question".

      Second, I will add a "category" attribute that collapses similar types.
      Right now I have: passage, passage-suspension, passage-part, amendment,
      nomination, procedural, cloture, and other. "passage-part" is for "On
      Agreeing to Article X of the Concurrent Resolution". "other" is so far
      only for divided questions.

      With these categories, when the type is passage, passage-suspension,
      amendment, and cloture, there's nothing informative in the <question>
      element that's not indicated by this category (except for the bill or
      amendment number, which is also indicated in the bill/amendment
      elements), so you can get away with displaying a fixed, user-friendly
      string for each of these categories rather than the original type.

      But it's still open for debate.

      - Josh Tauberer
      - CivicImpulse / GovTrack.us

      http://razor.occams.info | www.govtrack.us | civicimpulse.com

      "Members of both sides are reminded not to use guests of the
      House as props."

      On 02/16/2011 10:29 AM, Eric Mill wrote:
      > I've actually been doing some normalization using that field as well,
      > for our Real Time Congress API. It only works for a subset of them, of
      > course, but my regexes are pretty straightforward:
      > https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69
      >
      > You might want to consider retaining the original textual vote type,
      > and adding a new field with the standard vote type, so that a) it's
      > backwards compatible, and b) it lets people see the original text and
      > come up with their own normalization strategy if they want.
      >
      > In RTC, I leave the original text as the "roll_type" field, and then
      > the normalized field as "vote_type". The feed of votes contains voice
      > votes, as pulled from GovTrack's<vote> tags on bills, whose
      > "vote_type" is always set to "passage", and which do not have a
      > "roll_type".
      >
      > -- Eric
      >
      > On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer<tauberer@...> wrote:
      >> Hey, all.
      >>
      >> I am thinking of normalizing the "type" field in roll call vote files.
      >> Most won't change but I'll correct typos in the upstream data sources
      >> and I may revise some (like "Call of the House" to "Quorum Call" where
      >> appropriate). I'll also add a category attribute to the element to group
      >> similar vote types together (on passage, suspend the rules and pass, on
      >> agreeing to the resolution, on the amendment, etc. all labeled e.g.
      >> "passage").
      >>
      >> This is fair warning that I'll be changing that field on existing data
      >> real soon now. If that'll cause any problems for you (it really
      >> shouldn't), speak now!
      >>
      >> --
      >> - Josh Tauberer
      >> - CivicImpulse / GovTrack.us
      >>
      >> http://razor.occams.info | www.govtrack.us | civicimpulse.com
      >>
      >> "Members of both sides are reminded not to use guests of the
      >> House as props."
      >>
      >>
      >>
      >> ------------------------------------
      >>
      >> Yahoo! Groups Links
      >>
      >>
      >>
      >>
    • Eric Mill
      That sounds like a good approach. When I was doing this myself, I also contemplated normalizing it into two fields: vote_type, and vote_on. So a vote on an
      Message 2 of 4 , Feb 17, 2011
      • 0 Attachment
        That sounds like a good approach. When I was doing this myself, I also
        contemplated normalizing it into two fields: vote_type, and vote_on.
        So a vote on an amendment would have a vote_type of "passage" and a
        vote_on of "amendment". And a vote to start debate on a Supreme Court
        nomination would be a vote_type of "cloture" and a vote_on of
        "nomination".

        I gave up when I realized I didn't have enough information available
        to me in GovTrack's data (or in House/Senate feeds) to establish that
        a vote was a cloture vote on a nomination. Maybe you have more
        information, or a better handle on the problem though?

        -- Eric

        On Wed, Feb 16, 2011 at 6:48 PM, Josh Tauberer <tauberer@...> wrote:
        > The type is also repeated in the <question> element.
        >
        > For whatever reason, the scrapers make the <question> be TYPE + " :" +
        > BILL/AMENDMENT/OTHER in the House or TYPE + " (" + MATTER + ")" in the
        > Senate. At some point the arbitrary colon-versus-parens difference should be
        > removed as well.
        >
        > Since the <type> is also present in the <question>, I think it makes sense
        > to normalize one and leave the other. What I'm intending right now is:
        >
        > Normalize the <type> element by:
        >
        > 1) Removing differences in capitalization and phrasing which would be
        > irrelevant to anyone but a parliamentarian. So "On the Amendment" and "On
        > agreeing to the amendment" both become "On the Amendment".
        >
        > 2) Removing part specifications in divided questions and the like. "On
        > Agreeing to Article X of the Concurrent Resolution" becomes "On Agreeing to
        > the Concurrent Resolution (Part)" and "On Adoption of the fifth portion of
        > the divided question" becomes "On Part of the Divided Question".
        >
        > Second, I will add a "category" attribute that collapses similar types.
        > Right now I have: passage, passage-suspension, passage-part, amendment,
        > nomination, procedural, cloture, and other. "passage-part" is for "On
        > Agreeing to Article X of the Concurrent Resolution". "other" is so far only
        > for divided questions.
        >
        > With these categories, when the type is passage, passage-suspension,
        > amendment, and cloture, there's nothing informative in the <question>
        > element that's not indicated by this category (except for the bill or
        > amendment number, which is also indicated in the bill/amendment elements),
        > so you can get away with displaying a fixed, user-friendly string for each
        > of these categories rather than the original type.
        >
        > But it's still open for debate.
        >
        > - Josh Tauberer
        > - CivicImpulse / GovTrack.us
        >
        > http://razor.occams.info | www.govtrack.us | civicimpulse.com
        >
        > "Members of both sides are reminded not to use guests of the
        > House as props."
        >
        > On 02/16/2011 10:29 AM, Eric Mill wrote:
        >>
        >> I've actually been doing some normalization using that field as well,
        >> for our Real Time Congress API. It only works for a subset of them, of
        >> course, but my regexes are pretty straightforward:
        >>
        >> https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69
        >>
        >> You might want to consider retaining the original textual vote type,
        >> and adding a new field with the standard vote type, so that a) it's
        >> backwards compatible, and b) it lets people see the original text and
        >> come up with their own normalization strategy if they want.
        >>
        >> In RTC, I leave the original text as the "roll_type" field, and then
        >> the normalized field as "vote_type". The feed of votes contains voice
        >> votes, as pulled from GovTrack's<vote>  tags on bills, whose
        >> "vote_type" is always set to "passage", and which do not have a
        >> "roll_type".
        >>
        >> -- Eric
        >>
        >> On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer<tauberer@...>
        >>  wrote:
        >>>
        >>> Hey, all.
        >>>
        >>> I am thinking of normalizing the "type" field in roll call vote files.
        >>> Most won't change but I'll correct typos in the upstream data sources
        >>> and I may revise some (like "Call of the House" to "Quorum Call" where
        >>> appropriate). I'll also add a category attribute to the element to group
        >>> similar vote types together (on passage, suspend the rules and pass, on
        >>> agreeing to the resolution, on the amendment, etc. all labeled e.g.
        >>> "passage").
        >>>
        >>> This is fair warning that I'll be changing that field on existing data
        >>> real soon now. If that'll cause any problems for you (it really
        >>> shouldn't), speak now!
        >>>
        >>> --
        >>> - Josh Tauberer
        >>> - CivicImpulse / GovTrack.us
        >>>
        >>> http://razor.occams.info | www.govtrack.us | civicimpulse.com
        >>>
        >>> "Members of both sides are reminded not to use guests of the
        >>> House as props."
        >>>
        >>>
        >>>
        >>> ------------------------------------
        >>>
        >>> Yahoo! Groups Links
        >>>
        >>>
        >>>
        >>>
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.