Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] Re: Signed vs Enacted in bill XML?

Expand Messages
  • Josh Tauberer
    This is one of those cases where a deep screen-scrape of THOMAS goes wrong because their interface isn t clear. GovTrack had been using THOMAS s All
    Message 1 of 5 , Apr 5, 2010
    • 0 Attachment
      This is one of those cases where a deep screen-scrape of THOMAS goes
      wrong because their interface isn't clear. GovTrack had been using
      THOMAS's All Information page to get sponsor, cosposors, and actions
      (which gives status). This page has had a long-standing bug that it
      gets truncated if it is too long:

      http://thomas.loc.gov/cgi-bin/bdquery/z?d111:HR04872:@@@L&summ2=m&

      Actually I use this URL which isn't provided on the site which gives the
      same minus the summary, so it gets truncated later, and I get the
      summary by another fetch to the CRS Summary page specifically:

      http://thomas.loc.gov/cgi-bin/bdquery/z?d111:HR04872:@@@L

      I reported this bug to them probably four years ago and never heard back
      (except maybe for an initial acknowledgment).

      Since the last few action lines are cut off, the scraper didn't get a
      chance to see the bill was enacted.

      Anyway, I've now switched to using the All Congressional Action page for
      sponsor and actions which doesn't have this problem (or if it does, the
      output is a bit shorter so at least in this case it doesn't get cut off):

      http://thomas.loc.gov/cgi-bin/bdquery/z?d111:HR04872:@@@X

      Plus a second fetch to the page specific for cosponsors.

      That brings the total number of page fetches per bill to 8, and since
      they require accesses no more than one per second, this is why it takes
      so long to update info.

      I'll re-run the scrape on all bills this session (and later on earlier
      sessions) to make sure this doesn't cause problems on other bills.

      Thanks,

      - Josh Tauberer
      - CivicImpulse / GovTrack.us

      http://razor.occams.info | www.govtrack.us | civicimpulse.com

      "Members of both sides are reminded not to use guests of the
      House as props."

      On 04/05/2010 04:23 PM, Eric Mill wrote:
      > There's some kind of issue here - the bill status has now backtracked
      > to just PASSED:BILL, and the<signed> element is missing entirely.
      > http://www.govtrack.us/data/us/111/bills/h4872.xml
      >
      > And the GovTrack page for the bill now has the Signed By President
      > checkbox as unchecked:
      > http://www.govtrack.us/congress/bill.xpd?bill=h111-4872
      >
      > -- Eric
      >
      > On Wed, Mar 31, 2010 at 3:48 PM, Eric Mill<eric@...> wrote:
      >> Is there a delay between when a bill is signed into law, and gets a
      >> <signed> item in its<actions> history, and when it gets an<enacted>
      >> item? I ask because the reconciliation bill, HR 4872, has<signed>
      >> but not<enacted> in its<actions> history.
      >>
      >> The reconciliation bill:
      >> http://www.govtrack.us/data/us/111/bills/h4872.xml
      >>
      >> Versus the senate health care bill:
      >> http://www.govtrack.us/data/us/111/bills/h3590.xml
      >>
      >> -- Eric
      >>
      >
      >
      > ------------------------------------
      >
      > Yahoo! Groups Links
      >
      >
      >
    Your message has been successfully submitted and would be delivered to recipients shortly.