Loading ...
Sorry, an error occurred while loading the content.

Re: New Senator

Expand Messages
  • severian43
    Perhaps the easiest thing to do is to use something like wget, which only downloads the file when it s been changed (based on comparing the file modified
    Message 1 of 10 , Feb 8, 2010
    • 0 Attachment
      Perhaps the easiest thing to do is to use something like wget, which only downloads the file when it's been changed (based on comparing the file modified timestamp and the Last-Modified header, I think).

      For whereabill.org, I run a daily cron bash script that does this:

      WGET_OUTPUT=$(2>&1 wget -N -P data/ http://www.govtrack.us/data/us/people.xml)
      if echo "$WGET_OUTPUT" | fgrep 'saved' &> /dev/null
      then
      ...do stuff...
      fi

      So the file is only downloaded and processed if it's changed.

      Cheers,
      Kevin


      --- In govtrack@yahoogroups.com, Jack Angelo <jangelo42@...> wrote:
      >
      > the MD5 solution is a good one.
      >
      > jack
      >
      >
      > On Feb 8, 2010, at 8:16 AM, John Factorial wrote:
      >
      > > I think a "lastupdated" attribute to the <people> root element would be a nice step toward this end.
      > >
      > > Before that happens, Jack when you download people.xml from govtrack you could generate an MD5 checksum of the file and save it. The next time you download people.xml, generate the checksum again, and if they match you know the file is the same. I'm not sure what you mean when you say you're "uploading the file," but maybe this is a good solution for you.
      > >
      > > --- In govtrack@yahoogroups.com, "Jack" <jangelo42@> wrote:
      > > >
      > > > Is there a way to notice that the people.xml dataset has changed and what changed. Currently I am uploading the file once a week to make sure I am current but it would be better to be able to get a last modified date so I only upload when and what is new.
      > > >
      > >
      > >
      >
      > Jack Angelo
      > jangelo42@...
      >
    • Aaron Swartz
      It s very hard to collect this data; it requires a lot of calling around to offices and reading thru for odd subsections. The best source I know of is
      Message 2 of 10 , Feb 8, 2010
      • 0 Attachment
        It's very hard to collect this data; it requires a lot of calling around to offices and reading thru for odd subsections. The best source I know of is taxpayer.net although a Google search shows a couple more people have gotten into this biz.

        On Mon, Feb 8, 2010 at 3:18 PM, Paul Murphy <pmurphy@...> wrote:


        Josh,

         

        What’s the best way to complie a list of earmarks for the FY 2010 and FY 2011 budgets?  Are they listed in each of the appropriations bills or does Congress already compile a comprehensive list?

         

        Paul Murphy

         


        From: govtrack@yahoogroups.com [mailto:govtrack@yahoogroups.com] On Behalf Of Josh Tauberer
        Sent: Monday, February 08, 2010 3:07 PM
        To: govtrack@yahoogroups.com
        Cc: Jack Angelo
        Subject: Re: [govtrack] Re: New Senator

         

         

        Ok, so I'm off the hook. :)

        Since I edit the info in MySQL and then dump it to the XML file, I don't
        have a handy opportunity to update a last_modified attribute. I'd never
        remember to do it by hand anyway.

        - Josh Tauberer
        - CivicImpulse / GovTrack.us

        http://razor.occams.info | www.govtrack.us | civicimpulse.com

        "Members of both sides are reminded not to use guests of the
        House as props."

        On 02/08/2010 11:19 AM, Jack Angelo wrote:
        >
        >
        > the MD5 solution is a good one.
        >
        > jack
        >
        >
        > On Feb 8, 2010, at 8:16 AM, John Factorial wrote:
        >
        >> I think a "lastupdated" attribute to the <people> root element would
        >> be a nice step toward this end.
        >>
        >> Before that happens, Jack when you download people.xml from govtrack
        >> you could generate an MD5 checksum of the file and save it. The next
        >> time you download people.xml, generate the checksum again, and if they
        >> match you know the file is the same. I'm not sure what you mean when
        >> you say you're "uploading the file," but maybe this is a good solution
        >> for you.
        >>
        >> --- In govtrack@yahoogroups.com <mailto:govtrack%40yahoogroups.com>,
        >> "Jack" <jangelo42@...> wrote:
        >> >
        >> > Is there a way to notice that the people.xml dataset has changed and
        >> what changed. Currently I am uploading the file once a week to make
        >> sure I am current but it would be better to be able to get a last
        >> modified date so I only upload when and what is new.
        >> >
        >>
        >
        > Jack Angelo
        > jangelo42@... <mailto:jangelo42@...>
        >
        >
        >
        >
        >
        >




      • Josh Tauberer
        As soon as I can fix some bug, that file is going to start to get written out every day, twice a day, instead of just on Sundays. The contents generally won t
        Message 3 of 10 , Feb 8, 2010
        • 0 Attachment
          As soon as I can fix some bug, that file is going to start to get
          written out every day, twice a day, instead of just on Sundays. The
          contents generally won't change, but the modification date will.

          (Which is why I generally say use rsync anyway.)

          - Josh Tauberer
          - CivicImpulse / GovTrack.us

          http://razor.occams.info | www.govtrack.us | civicimpulse.com

          "Members of both sides are reminded not to use guests of the
          House as props."

          On 02/08/2010 03:24 PM, severian43 wrote:
          > Perhaps the easiest thing to do is to use something like wget, which only downloads the file when it's been changed (based on comparing the file modified timestamp and the Last-Modified header, I think).
          >
          > For whereabill.org, I run a daily cron bash script that does this:
          >
          > WGET_OUTPUT=$(2>&1 wget -N -P data/ http://www.govtrack.us/data/us/people.xml)
          > if echo "$WGET_OUTPUT" | fgrep 'saved'&> /dev/null
          > then
          > ...do stuff...
          > fi
          >
          > So the file is only downloaded and processed if it's changed.
          >
          > Cheers,
          > Kevin
          >
          >
          > --- In govtrack@yahoogroups.com, Jack Angelo<jangelo42@...> wrote:
          >>
          >> the MD5 solution is a good one.
          >>
          >> jack
          >>
          >>
          >> On Feb 8, 2010, at 8:16 AM, John Factorial wrote:
          >>
          >>> I think a "lastupdated" attribute to the<people> root element would be a nice step toward this end.
          >>>
          >>> Before that happens, Jack when you download people.xml from govtrack you could generate an MD5 checksum of the file and save it. The next time you download people.xml, generate the checksum again, and if they match you know the file is the same. I'm not sure what you mean when you say you're "uploading the file," but maybe this is a good solution for you.
          >>>
          >>> --- In govtrack@yahoogroups.com, "Jack"<jangelo42@> wrote:
          >>>>
          >>>> Is there a way to notice that the people.xml dataset has changed and what changed. Currently I am uploading the file once a week to make sure I am current but it would be better to be able to get a last modified date so I only upload when and what is new.
          >>>>
          >>>
          >>>
          >>
          >> Jack Angelo
          >> jangelo42@...
          >>
          >
          >
          >
          >
          > ------------------------------------
          >
          > Yahoo! Groups Links
          >
          >
          >
        • severian43
          OK. I preferred wget to rsync in this case because it s easy to tell from the output of the command if a new file was downloaded. But there are other ways to
          Message 4 of 10 , Feb 8, 2010
          • 0 Attachment
            OK. I preferred wget to rsync in this case because it's easy to tell from the output of the command if a new file was downloaded. But there are other ways to figure it out, so I'll switch to rsync...

            Cheers,
            Kevin


            --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
            >
            > As soon as I can fix some bug, that file is going to start to get
            > written out every day, twice a day, instead of just on Sundays. The
            > contents generally won't change, but the modification date will.
            >
            > (Which is why I generally say use rsync anyway.)
            >
            > - Josh Tauberer
            > - CivicImpulse / GovTrack.us
            >
            > http://razor.occams.info | www.govtrack.us | civicimpulse.com
            >
            > "Members of both sides are reminded not to use guests of the
            > House as props."
            >
            > On 02/08/2010 03:24 PM, severian43 wrote:
            > > Perhaps the easiest thing to do is to use something like wget, which only downloads the file when it's been changed (based on comparing the file modified timestamp and the Last-Modified header, I think).
            > >
            > > For whereabill.org, I run a daily cron bash script that does this:
            > >
            > > WGET_OUTPUT=$(2>&1 wget -N -P data/ http://www.govtrack.us/data/us/people.xml)
            > > if echo "$WGET_OUTPUT" | fgrep 'saved'&> /dev/null
            > > then
            > > ...do stuff...
            > > fi
            > >
            > > So the file is only downloaded and processed if it's changed.
            > >
            > > Cheers,
            > > Kevin
            > >
            > >
            > > --- In govtrack@yahoogroups.com, Jack Angelo<jangelo42@> wrote:
            > >>
            > >> the MD5 solution is a good one.
            > >>
            > >> jack
            > >>
            > >>
            > >> On Feb 8, 2010, at 8:16 AM, John Factorial wrote:
            > >>
            > >>> I think a "lastupdated" attribute to the<people> root element would be a nice step toward this end.
            > >>>
            > >>> Before that happens, Jack when you download people.xml from govtrack you could generate an MD5 checksum of the file and save it. The next time you download people.xml, generate the checksum again, and if they match you know the file is the same. I'm not sure what you mean when you say you're "uploading the file," but maybe this is a good solution for you.
            > >>>
            > >>> --- In govtrack@yahoogroups.com, "Jack"<jangelo42@> wrote:
            > >>>>
            > >>>> Is there a way to notice that the people.xml dataset has changed and what changed. Currently I am uploading the file once a week to make sure I am current but it would be better to be able to get a last modified date so I only upload when and what is new.
            > >>>>
            > >>>
            > >>>
            > >>
            > >> Jack Angelo
            > >> jangelo42@
            > >>
            > >
            > >
            > >
            > >
            > > ------------------------------------
            > >
            > > Yahoo! Groups Links
            > >
            > >
            > >
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.