Loading ...
Sorry, an error occurred while loading the content.
 

Re: [govtrack] xml data

Expand Messages
  • Joshua Tauberer
    ... Congrats on making the first post to this list. :) Anyway, yes, a ton of data is an understatement. In case you didn t see the tiny link on the events
    Message 1 of 6 , Nov 5, 2004
      Ed Summers wrote:
      > Hi there, I just was referred to govtrack via a colleague on the
      > govdoc-l list [1]. I've been researching creating RSS feeds for
      > congressional votes, and noticed that you've got a ton of data in XML on
      > your site.

      Congrats on making the first post to this list. :)

      Anyway, yes, a ton of data is an understatement. In case you didn't see
      the tiny link on the events page, there is an RSS feed for all
      bill-passage votes:

      http://www.govtrack.us/users/events.xpd?monitors=misc:allvotes
      (Click the XML link at the top-right.)

      > Can I ask what your data sources are, and how you end up with this XML?

      From the About page: "GovTrack gets its information from THOMAS, the
      official website for the status of legislation at the Library of
      Congress, and the U.S. Senate and U.S. House websites, the official
      sources for voting records. The committee hearing schedule is obtained
      through this Senate page. There is no centralized House committee
      schedule as far as I am aware, so that information is not currently
      available on GovTrack."

      All of the data is screen-scraped, which means the website pulls up
      these websites as if it were a human, and then does some pattern
      matching on the pages to find the information it needs. It's an
      unfortunate but necessary method. The site takes the information and
      throws together the XML files. It puts RSS feeds together dynamically
      when they are requested.

      If I'm missing the kind of feed you were looking to create, I would like
      to add it. You're also welcome to use the raw data (see the link below)
      to create your own feeds.

      http://www.govtrack.us/source.xpd

      --
      - Joshua Tauberer

      http://taubz.for.net

      ** Nothing Unreal Exists **
    • Ed Summers
      Thanks for the info Joshua. Your service is really very nice. Just out of curiosity how hairy are your Thomas crawlers/parsers? You used Mono for all of it? Is
      Message 2 of 6 , Nov 6, 2004
        Thanks for the info Joshua. Your service is really very nice. Just out
        of curiosity how hairy are your Thomas crawlers/parsers? You used Mono
        for all of it? Is source code available?

        That said, I highly doubt it would be worth the effort to replicate this
        parsing work. A small group of friends were interested in reproducing
        theyworkforyou.com for the US audience.

        One last question, what do you use to generate those voting maps?

        //Ed
      • Joshua Tauberer
        ... The crawlers are written in Perl. They re pretty hairy. I haven t posted the source for that, and I probably won t for some time. ... Right, I hadn t
        Message 3 of 6 , Nov 6, 2004
          Ed Summers wrote:
          > Thanks for the info Joshua. Your service is really very nice. Just out
          > of curiosity how hairy are your Thomas crawlers/parsers? You used Mono
          > for all of it? Is source code available?

          The crawlers are written in Perl. They're pretty hairy. I haven't
          posted the source for that, and I probably won't for some time.

          > That said, I highly doubt it would be worth the effort to replicate this
          > parsing work. A small group of friends were interested in reproducing
          > theyworkforyou.com for the US audience.

          Right, I hadn't seen that site when I started working on this, but that
          was definitely my goal too. There's still a lot of ground I haven't
          covered, so there are still aspects of all of this that would be worth
          someone undertaking. (Better voting record comparisons, contacting
          congress, public discussion of bills/representatives, more statistical
          analyses, comprehensive biographical data...)

          I definitely encourage you to stick with your original motivation to do
          something -- if you start to work on something, I'd be interested in
          following how it goes. And since all of GovTrack's xml data is
          available for anyone to use, you can skip the parsing part.

          > One last question, what do you use to generate those voting maps?

          GMT: http://gmt.soest.hawaii.edu/

          It was the first thing I came across that would do it. Not easy to use
          at all, and it only outputs postscript, so I pass the output through
          ghostscript. Plus I have a database mapping congressional districts to
          latitute/longitudes. I don't remember where I found it, but I can send
          it to you if you'd like.

          --
          - Joshua Tauberer

          http://taubz.for.net

          ** Nothing Unreal Exists **
        • Wilsongreg
          ... this looks like a great tool. i look forward to investigating it further. thanks for your work! ... do you have a similar mapping of zip codes to
          Message 4 of 6 , Nov 19, 2004
            --- In govtrack@yahoogroups.com, Joshua Tauberer <tauberer@f...> wrote:

            <snip>

            this looks like a great tool. i look forward to investigating it
            further. thanks for your work!

            > Plus I have a database mapping congressional districts to
            > latitute/longitudes. I don't remember where I found it, but I can send
            > it to you if you'd like.

            do you have a similar mapping of zip codes to congressional districts?

            thanks!

            greg wilson


            >
            > --
            > - Joshua Tauberer
            >
            > http://taubz.for.net
            >
            > ** Nothing Unreal Exists **
          • Joshua Tauberer
            ... Heh, yeah, it s actually all part of the same database. I ve posted it at the address below, but note that it s going to be obsolete in January.
            Message 5 of 6 , Nov 19, 2004
              Wilsongreg wrote:
              > --- In govtrack@yahoogroups.com, Joshua Tauberer <tauberer@f...> wrote:
              > > Plus I have a database mapping congressional districts to
              > > latitute/longitudes. I don't remember where I found it, but I can send
              > > it to you if you'd like.
              >
              > do you have a similar mapping of zip codes to congressional districts?

              Heh, yeah, it's actually all part of the same database. I've posted it
              at the address below, but note that it's going to be obsolete in January.

              http://www.govtrack.us/data/misc/cities.csv

              The columns are mostly evident: state, city, congressional district,
              something weird with the zip code, alternate names, zipcode, area code,
              latitute, longitude.

              I don't have a database yet with the redistricting for 2005. (If you
              happen to come across one, let me know.)

              --
              - Joshua Tauberer

              http://taubz.for.net

              ** Nothing Unreal Exists **
            Your message has been successfully submitted and would be delivered to recipients shortly.