Re: [govtrack] Normalizing vote type fields
- That sounds like a good approach. When I was doing this myself, I also
contemplated normalizing it into two fields: vote_type, and vote_on.
So a vote on an amendment would have a vote_type of "passage" and a
vote_on of "amendment". And a vote to start debate on a Supreme Court
nomination would be a vote_type of "cloture" and a vote_on of
I gave up when I realized I didn't have enough information available
to me in GovTrack's data (or in House/Senate feeds) to establish that
a vote was a cloture vote on a nomination. Maybe you have more
information, or a better handle on the problem though?
On Wed, Feb 16, 2011 at 6:48 PM, Josh Tauberer <tauberer@...> wrote:
> The type is also repeated in the <question> element.
> For whatever reason, the scrapers make the <question> be TYPE + " :" +
> BILL/AMENDMENT/OTHER in the House or TYPE + " (" + MATTER + ")" in the
> Senate. At some point the arbitrary colon-versus-parens difference should be
> removed as well.
> Since the <type> is also present in the <question>, I think it makes sense
> to normalize one and leave the other. What I'm intending right now is:
> Normalize the <type> element by:
> 1) Removing differences in capitalization and phrasing which would be
> irrelevant to anyone but a parliamentarian. So "On the Amendment" and "On
> agreeing to the amendment" both become "On the Amendment".
> 2) Removing part specifications in divided questions and the like. "On
> Agreeing to Article X of the Concurrent Resolution" becomes "On Agreeing to
> the Concurrent Resolution (Part)" and "On Adoption of the fifth portion of
> the divided question" becomes "On Part of the Divided Question".
> Second, I will add a "category" attribute that collapses similar types.
> Right now I have: passage, passage-suspension, passage-part, amendment,
> nomination, procedural, cloture, and other. "passage-part" is for "On
> Agreeing to Article X of the Concurrent Resolution". "other" is so far only
> for divided questions.
> With these categories, when the type is passage, passage-suspension,
> amendment, and cloture, there's nothing informative in the <question>
> element that's not indicated by this category (except for the bill or
> amendment number, which is also indicated in the bill/amendment elements),
> so you can get away with displaying a fixed, user-friendly string for each
> of these categories rather than the original type.
> But it's still open for debate.
> - Josh Tauberer
> - CivicImpulse / GovTrack.us
> http://razor.occams.info | www.govtrack.us | civicimpulse.com
> "Members of both sides are reminded not to use guests of the
> House as props."
> On 02/16/2011 10:29 AM, Eric Mill wrote:
>> I've actually been doing some normalization using that field as well,
>> for our Real Time Congress API. It only works for a subset of them, of
>> course, but my regexes are pretty straightforward:
>> You might want to consider retaining the original textual vote type,
>> and adding a new field with the standard vote type, so that a) it's
>> backwards compatible, and b) it lets people see the original text and
>> come up with their own normalization strategy if they want.
>> In RTC, I leave the original text as the "roll_type" field, and then
>> the normalized field as "vote_type". The feed of votes contains voice
>> votes, as pulled from GovTrack's<vote> tags on bills, whose
>> "vote_type" is always set to "passage", and which do not have a
>> -- Eric
>> On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer<tauberer@...>
>>> Hey, all.
>>> I am thinking of normalizing the "type" field in roll call vote files.
>>> Most won't change but I'll correct typos in the upstream data sources
>>> and I may revise some (like "Call of the House" to "Quorum Call" where
>>> appropriate). I'll also add a category attribute to the element to group
>>> similar vote types together (on passage, suspend the rules and pass, on
>>> agreeing to the resolution, on the amendment, etc. all labeled e.g.
>>> This is fair warning that I'll be changing that field on existing data
>>> real soon now. If that'll cause any problems for you (it really
>>> shouldn't), speak now!
>>> - Josh Tauberer
>>> - CivicImpulse / GovTrack.us
>>> http://razor.occams.info | www.govtrack.us | civicimpulse.com
>>> "Members of both sides are reminded not to use guests of the
>>> House as props."
>>> Yahoo! Groups Links