Loading ...
Sorry, an error occurred while loading the content.

xml data

Expand Messages
  • Ed Summers
    Hi there, I just was referred to govtrack via a colleague on the govdoc-l list [1]. I ve been researching creating RSS feeds for congressional votes, and
    Message 1 of 6 , Nov 5, 2004
    • 0 Attachment
      Hi there, I just was referred to govtrack via a colleague on the
      govdoc-l list [1]. I've been researching creating RSS feeds for
      congressional votes, and noticed that you've got a ton of data in XML on
      your site.

      Can I ask what your data sources are, and how you end up with this XML?

      //Ed

      [1] http://docs.lib.duke.edu/federal/govdoc-l/index.html

      --
      Ed Summers
      aim: inkdroid
      web: http://www.inkdroid.org

      If a system is to serve the creative spirit, it must be entirely comprehensible
      to a single individual.
      [Daniel Ingalls]
    • Joshua Tauberer
      ... Congrats on making the first post to this list. :) Anyway, yes, a ton of data is an understatement. In case you didn t see the tiny link on the events
      Message 2 of 6 , Nov 5, 2004
      • 0 Attachment
        Ed Summers wrote:
        > Hi there, I just was referred to govtrack via a colleague on the
        > govdoc-l list [1]. I've been researching creating RSS feeds for
        > congressional votes, and noticed that you've got a ton of data in XML on
        > your site.

        Congrats on making the first post to this list. :)

        Anyway, yes, a ton of data is an understatement. In case you didn't see
        the tiny link on the events page, there is an RSS feed for all
        bill-passage votes:

        http://www.govtrack.us/users/events.xpd?monitors=misc:allvotes
        (Click the XML link at the top-right.)

        > Can I ask what your data sources are, and how you end up with this XML?

        From the About page: "GovTrack gets its information from THOMAS, the
        official website for the status of legislation at the Library of
        Congress, and the U.S. Senate and U.S. House websites, the official
        sources for voting records. The committee hearing schedule is obtained
        through this Senate page. There is no centralized House committee
        schedule as far as I am aware, so that information is not currently
        available on GovTrack."

        All of the data is screen-scraped, which means the website pulls up
        these websites as if it were a human, and then does some pattern
        matching on the pages to find the information it needs. It's an
        unfortunate but necessary method. The site takes the information and
        throws together the XML files. It puts RSS feeds together dynamically
        when they are requested.

        If I'm missing the kind of feed you were looking to create, I would like
        to add it. You're also welcome to use the raw data (see the link below)
        to create your own feeds.

        http://www.govtrack.us/source.xpd

        --
        - Joshua Tauberer

        http://taubz.for.net

        ** Nothing Unreal Exists **
      • Ed Summers
        Thanks for the info Joshua. Your service is really very nice. Just out of curiosity how hairy are your Thomas crawlers/parsers? You used Mono for all of it? Is
        Message 3 of 6 , Nov 6, 2004
        • 0 Attachment
          Thanks for the info Joshua. Your service is really very nice. Just out
          of curiosity how hairy are your Thomas crawlers/parsers? You used Mono
          for all of it? Is source code available?

          That said, I highly doubt it would be worth the effort to replicate this
          parsing work. A small group of friends were interested in reproducing
          theyworkforyou.com for the US audience.

          One last question, what do you use to generate those voting maps?

          //Ed
        • Joshua Tauberer
          ... The crawlers are written in Perl. They re pretty hairy. I haven t posted the source for that, and I probably won t for some time. ... Right, I hadn t
          Message 4 of 6 , Nov 6, 2004
          • 0 Attachment
            Ed Summers wrote:
            > Thanks for the info Joshua. Your service is really very nice. Just out
            > of curiosity how hairy are your Thomas crawlers/parsers? You used Mono
            > for all of it? Is source code available?

            The crawlers are written in Perl. They're pretty hairy. I haven't
            posted the source for that, and I probably won't for some time.

            > That said, I highly doubt it would be worth the effort to replicate this
            > parsing work. A small group of friends were interested in reproducing
            > theyworkforyou.com for the US audience.

            Right, I hadn't seen that site when I started working on this, but that
            was definitely my goal too. There's still a lot of ground I haven't
            covered, so there are still aspects of all of this that would be worth
            someone undertaking. (Better voting record comparisons, contacting
            congress, public discussion of bills/representatives, more statistical
            analyses, comprehensive biographical data...)

            I definitely encourage you to stick with your original motivation to do
            something -- if you start to work on something, I'd be interested in
            following how it goes. And since all of GovTrack's xml data is
            available for anyone to use, you can skip the parsing part.

            > One last question, what do you use to generate those voting maps?

            GMT: http://gmt.soest.hawaii.edu/

            It was the first thing I came across that would do it. Not easy to use
            at all, and it only outputs postscript, so I pass the output through
            ghostscript. Plus I have a database mapping congressional districts to
            latitute/longitudes. I don't remember where I found it, but I can send
            it to you if you'd like.

            --
            - Joshua Tauberer

            http://taubz.for.net

            ** Nothing Unreal Exists **
          • Wilsongreg
            ... this looks like a great tool. i look forward to investigating it further. thanks for your work! ... do you have a similar mapping of zip codes to
            Message 5 of 6 , Nov 19, 2004
            • 0 Attachment
              --- In govtrack@yahoogroups.com, Joshua Tauberer <tauberer@f...> wrote:

              <snip>

              this looks like a great tool. i look forward to investigating it
              further. thanks for your work!

              > Plus I have a database mapping congressional districts to
              > latitute/longitudes. I don't remember where I found it, but I can send
              > it to you if you'd like.

              do you have a similar mapping of zip codes to congressional districts?

              thanks!

              greg wilson


              >
              > --
              > - Joshua Tauberer
              >
              > http://taubz.for.net
              >
              > ** Nothing Unreal Exists **
            • Joshua Tauberer
              ... Heh, yeah, it s actually all part of the same database. I ve posted it at the address below, but note that it s going to be obsolete in January.
              Message 6 of 6 , Nov 19, 2004
              • 0 Attachment
                Wilsongreg wrote:
                > --- In govtrack@yahoogroups.com, Joshua Tauberer <tauberer@f...> wrote:
                > > Plus I have a database mapping congressional districts to
                > > latitute/longitudes. I don't remember where I found it, but I can send
                > > it to you if you'd like.
                >
                > do you have a similar mapping of zip codes to congressional districts?

                Heh, yeah, it's actually all part of the same database. I've posted it
                at the address below, but note that it's going to be obsolete in January.

                http://www.govtrack.us/data/misc/cities.csv

                The columns are mostly evident: state, city, congressional district,
                something weird with the zip code, alternate names, zipcode, area code,
                latitute, longitude.

                I don't have a database yet with the redistricting for 2005. (If you
                happen to come across one, let me know.)

                --
                - Joshua Tauberer

                http://taubz.for.net

                ** Nothing Unreal Exists **
              Your message has been successfully submitted and would be delivered to recipients shortly.