Loading ...
Sorry, an error occurred while loading the content.

What is the best way to fetch votes?

Expand Messages
  • Neil Drumm
    Every day we: - Fetch bills.index.xml - Compare each bill s last action date to our copy, if different, a bill update is queued. - On bill update, the vote
    Message 1 of 5 , Oct 23, 2007
    • 0 Attachment
      Every day we:
      - Fetch bills.index.xml
      - Compare each bill's last action date to our copy, if different, a
      bill update is queued.
      - On bill update, the vote list is fetched from votes_download_xml.xpd.
      - Each individual roll vote XML is fetched.

      I hear votes are updated more frequently; what is the best way to find
      which votes have been added? My algorithm is limited to daily because
      of the last action date comparison.

      --
      Neil Drumm
      http://delocalizedham.com
    • Kevin Henry
      Hey Neil, I think the easiest way to keep up to date is just to use rsync for both the bill and vote directories - it will automatically check the (file) dates
      Message 2 of 5 , Oct 23, 2007
      • 0 Attachment
        Hey Neil,

        I think the easiest way to keep up to date is just to use rsync for
        both the bill and vote directories - it will automatically check the
        (file) dates and only download the files that have changed. See
        http://www.govtrack.us/source.xpd.

        There are some files that don't seem to be managed by the rsync server
        (like bills.technorati.xml), so for those I just use wget, which also
        checks the date before downloading.

        And if you're doing anything with the people data, make sure you're
        also wgetting people.xml in your daily update. I forgot about this at
        first and slowly got out of sync.

        I do all of the above using a daily cron script, but I don't know if
        the votes are getting updated more frequently.

        Good luck...

        Kevin
        http://www.whereabill.org/


        --- In govtrack@yahoogroups.com, "Neil Drumm" <drumm@...> wrote:
        >
        > Every day we:
        > - Fetch bills.index.xml
        > - Compare each bill's last action date to our copy, if different, a
        > bill update is queued.
        > - On bill update, the vote list is fetched from votes_download_xml.xpd.
        > - Each individual roll vote XML is fetched.
        >
        > I hear votes are updated more frequently; what is the best way to find
        > which votes have been added? My algorithm is limited to daily because
        > of the last action date comparison.
        >
        > --
        > Neil Drumm
        > http://delocalizedham.com
        >
      • Josh Tauberer
        ... Right. For reference, the file data/us/110/votes.all.index.xml is in danger of being dropped. I dropped it in January, brought it back, and now I don t use
        Message 3 of 5 , Oct 24, 2007
        • 0 Attachment
          Kevin Henry wrote:
          > I think the easiest way to keep up to date is just to use rsync for
          > both the bill and vote directories - it will automatically check the
          > (file) dates and only download the files that have changed. See
          > http://www.govtrack.us/source.xpd.

          Right.

          For reference, the file data/us/110/votes.all.index.xml is in danger of
          being dropped. I dropped it in January, brought it back, and now I don't
          use it anymore so I will probably get rid of it again.

          > There are some files that don't seem to be managed by the rsync server
          > (like bills.technorati.xml), so for those I just use wget, which also
          > checks the date before downloading.

          That's weird. Everything in /data should work under rsync. Are you sure
          something is wrong there?

          > And if you're doing anything with the people data, make sure you're
          > also wgetting people.xml in your daily update. I forgot about this at
          > first and slowly got out of sync.

          Very true.

          > I do all of the above using a daily cron script, but I don't know if
          > the votes are getting updated more frequently.

          Yes, now every 15 minutes (just roll call votes). THOMAS is checked at
          8:10am and noon for all other updates, these days.

          --
          - Josh Tauberer

          http://razor.occams.info

          "Yields falsehood when preceded by its quotation! Yields
          falsehood when preceded by its quotation!" Achilles to
          Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)
        • Neil Drumm
          ... Yes, I had been using this previously, but have removed dependence on it. Right now everything is run over HTTP. The main advantage is less to set up and
          Message 4 of 5 , Oct 24, 2007
          • 0 Attachment
            On 10/24/07, Josh Tauberer <tauberer@...> wrote:
            >
            > Kevin Henry wrote:
            > > I think the easiest way to keep up to date is just to use rsync for
            > > both the bill and vote directories - it will automatically check the
            > > (file) dates and only download the files that have changed. See
            > > http://www.govtrack.us/source.xpd.
            >
            > Right.
            >
            > For reference, the file data/us/110/votes.all.index.xml is in danger of
            > being dropped. I dropped it in January, brought it back, and now I don't
            > use it anymore so I will probably get rid of it again.

            Yes, I had been using this previously, but have removed dependence on it.

            Right now everything is run over HTTP. The main advantage is less to
            set up and less that might fail. Moving to keeping a local copy on
            rsync is possible, but a bit of effort and a different set of
            advantages and disadvantages.

            I suppose an alternative might be parsing
            http://www.govtrack.us/data/us/110/rolls/ periodically.

            > > There are some files that don't seem to be managed by the rsync server
            > > (like bills.technorati.xml), so for those I just use wget, which also
            > > checks the date before downloading.
            >
            > That's weird. Everything in /data should work under rsync. Are you sure
            > something is wrong there?
            >
            > > And if you're doing anything with the people data, make sure you're
            > > also wgetting people.xml in your daily update. I forgot about this at
            > > first and slowly got out of sync.
            >
            > Very true.

            We have this automated, but not scheduled. I was thinking of doing it weekly.

            --
            Neil Drumm
            http://delocalizedham.com
          • Kevin Henry
            ... You re probably right, it was many months ago that I ran into a problem and switched to wget, and I haven t tried it since. Kevin
            Message 5 of 5 , Oct 24, 2007
            • 0 Attachment
              > > There are some files that don't seem to be managed by the rsync server
              > > (like bills.technorati.xml), so for those I just use wget, which also
              > > checks the date before downloading.
              >
              > That's weird. Everything in /data should work under rsync. Are you sure
              > something is wrong there?

              You're probably right, it was many months ago that I ran into a
              problem and switched to wget, and I haven't tried it since.


              Kevin
              http://www.whereabill.org/
            Your message has been successfully submitted and would be delivered to recipients shortly.