Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] Vote update timing

Expand Messages
  • Josh Tauberer
    ... The files are updated every 15 min. now, or something. The bill element comes in immediately --- it is information detected from the (official) source data
    Message 1 of 4 , Dec 13, 2007
    • 0 Attachment
      Neil Drumm wrote:
      > What is the schedule for updating roll vote XML files, as in
      > /data/us/110/rolls/...? In particular, when is the bill element added?

      The files are updated every 15 min. now, or something.

      The bill element comes in immediately --- it is information detected
      from the (official) source data file. So if it's missing, the source
      data page may have a mistake, or there could be a parsing mistake on my
      end. If you check the source and see a bill clearly identified but no
      bill element, let me know.

      > Our algorithm currently goes:
      > 1. Get http://www.govtrack.us/congress/votes_download_xml.xpd and look
      > for new votes.

      Wow, you could do that, but if you're going to ping by HTTP regularly,
      I'd much prefer you just fetch http://www.govtrack.us/data/us/110/rolls
      and parse the directory listing, since it involves much less processor
      overhead.

      > 2. For each new vote
      > 2a. Get the roll vote XML file to determine what bill to update.
      > 3b. Fully update the bill.

      Again, your best bet for updating bills is, besides rsync, parsing the
      directory listing at http://www.govtrack.us/data/us/110/bills.

      Starting very soon I think I am going to cut down severely on all of my
      government-transparency time, so I would normally offer to find a better
      solution than parsing directory listing pages, but now I won't.

      If you or anyone wanted to offer a Perl script that I could put in place
      to output, for instance, a machine-readable directory listing with
      last-modified times, I could use that.

      > A few weeks ago we missed a couple votes, but they worked when the
      > bill update was manually triggered. I did not catch the XML quickly
      > enough to verify, but I think the bill element might have been
      > missing, causing step 2a to fail. Or at least, that is the simplest
      > explanation.

      I'm not sure what might have happened. It's possible the vote appeared
      before the bill did.

      --
      - Josh Tauberer
      - GovTrack.us

      http://razor.occams.info

      "Yields falsehood when preceded by its quotation! Yields
      falsehood when preceded by its quotation!" Achilles to
      Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)
    • Josh Tauberer
      ... I m not sure that really improves on the Apache directory listing. The time format, for instance, varies depending on the age of the file. -- - Josh
      Message 2 of 4 , Dec 13, 2007
      • 0 Attachment
        Sam Smith wrote:
        > On Thu, 13 Dec 2007, Josh Tauberer wrote:
        >> If you or anyone wanted to offer a Perl script that I could put in place
        >> to output, for instance, a machine-readable directory listing with
        >> last-modified times, I could use that.
        >
        > cron this at the top level:
        >
        > ls -lR | zgip > ls-lR.gz

        I'm not sure that really improves on the Apache directory listing. The
        time format, for instance, varies depending on the age of the file.

        --
        - Josh Tauberer
        - GovTrack.us

        http://razor.occams.info

        "Yields falsehood when preceded by its quotation! Yields
        falsehood when preceded by its quotation!" Achilles to
        Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)
      • Corey Gilmore
        ... ... better ... place ... If it s at all possible for you I d highly recommend switching to rsync. My process is: * Run rsync on the votes dir, get
        Message 3 of 4 , Dec 13, 2007
        • 0 Attachment
          --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
          >
          > Neil Drumm wrote:
          <snip>
          > > Our algorithm currently goes:
          > > 1. Get http://www.govtrack.us/congress/votes_download_xml.xpd and look
          > > for new votes.
          >
          > Wow, you could do that, but if you're going to ping by HTTP regularly,
          > I'd much prefer you just fetch http://www.govtrack.us/data/us/110/rolls
          > and parse the directory listing, since it involves much less processor
          > overhead.
          >
          > > 2. For each new vote
          > > 2a. Get the roll vote XML file to determine what bill to update.
          > > 3b. Fully update the bill.
          >
          > Again, your best bet for updating bills is, besides rsync, parsing the
          > directory listing at http://www.govtrack.us/data/us/110/bills.
          >
          > Starting very soon I think I am going to cut down severely on all of my
          > government-transparency time, so I would normally offer to find a
          better
          > solution than parsing directory listing pages, but now I won't.
          >
          > If you or anyone wanted to offer a Perl script that I could put in
          place
          > to output, for instance, a machine-readable directory listing with
          > last-modified times, I could use that.
          >

          If it's at all possible for you I'd highly recommend switching to
          rsync. My process is:
          * Run rsync on the votes dir, get the latest votes
          * Load a copy of my local votes directory listing into memory
          (www.php.net/scandir)
          * Grab the IDs of votes I've imported from my db (essentially select
          concat(vote_id, '.xml') as vote from votes) and put them into an array
          * $import = array_diff($files, $votes); (www.php.net/array_diff)
          * Process the list of votes to import. It's not something you're
          running that often, and array_diff isn't that CPU intensive with the
          relatively few votes you see in an typical year.
        Your message has been successfully submitted and would be delivered to recipients shortly.