Loading ...
Sorry, an error occurred while loading the content.
 

Re: [govtrack] dataset for data mining

Expand Messages
  • Neil Drumm
    ... I built the legislature module, http://drupal.org/project/legislature, for Drupal. I would not say it is particularly nice quite yet, but it does import
    Message 1 of 6 , May 1, 2008
      On Wed, Apr 30, 2008 at 10:46 PM, storiesbythefire <tomdeme@...> wrote:
      >
      > Has anyone on the list created a nice script for capturing and
      > converting the govtrack xml into datasets for data mining? Or are
      > such data sets available for research use?

      I built the legislature module, http://drupal.org/project/legislature,
      for Drupal. I would not say it is particularly nice quite yet, but it
      does import politicians, bills, and votes into a MySQL (theoretically
      Postgres too) database. I built this code for a couple client
      projects, so I have not spent too much time polishing for end-users.
      It might be a bit overkill for your task since you end up with a whole
      CMS installed too.

      The general process is
      1. Install Drupal 5.
      2. Install Import manager and Job queue modules.
      3. Install Legislature module.
      4. Enable desired sessions, 108-110 have been tested. Visit the Menu
      admin page to clear some caches.
      5. Import politicians. You may need to raise PHP memory or execution
      time limits, I generally run imports from the shell, which has a
      separate php.ini, using drupal.sh from Drupal 6.
      6. Import bills.
      7. Run Drupal's cron script until all queued jobs are done.
      8. Write and run queries against your local database.

      I am doing a presentation on this tomorrow in Sunnyvale, CA at the
      News Tools Drupal day,
      http://www.mediagiraffe.org/wiki/index.php/Jtm-sv-drupal.

      --
      Neil Drumm
      http://delocalizedham.com
    • Josh Tauberer
      ... If you have slides, please post them here! -- - Josh Tauberer - GovTrack.us http://razor.occams.info Yields falsehood when preceded by its quotation!
      Message 2 of 6 , May 1, 2008
        Neil Drumm wrote:
        > I am doing a presentation on this tomorrow in Sunnyvale, CA at the
        > News Tools Drupal day,
        > http://www.mediagiraffe.org/wiki/index.php/Jtm-sv-drupal.

        If you have slides, please post them here!

        --
        - Josh Tauberer
        - GovTrack.us

        http://razor.occams.info

        "Yields falsehood when preceded by its quotation! Yields
        falsehood when preceded by its quotation!" Achilles to
        Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
      • Neil Drumm
        ... I will have some sample code once I write it. Let me know if you have any quick ideas for integrating GovTrack data with journalism. My current plan is: -
        Message 3 of 6 , May 1, 2008
          On Thu, May 1, 2008 at 11:04 AM, Josh Tauberer <tauberer@...> wrote:
          >
          > Neil Drumm wrote:
          > > I am doing a presentation on this tomorrow in Sunnyvale, CA at the
          > > News Tools Drupal day,
          > > http://www.mediagiraffe.org/wiki/index.php/Jtm-sv-drupal.
          >
          > If you have slides, please post them here!

          I will have some sample code once I write it. Let me know if you have
          any quick ideas for integrating GovTrack data with journalism. My
          current plan is:
          - A couple straightforward tables of votes to cover the basics.
          - Text filter for something like '<a title="S.Roll 123">the bill
          passed</a>' to '<a href="[url]" title="S.Roll 123">the bill passed
          [small sparklike-like graph]</a>'.

          --
          Neil Drumm
          http://delocalizedham.com
        • storiesbythefire
          Thanks very much, Neil. Your solution will take me much closer to my goal. I ve used Drupal before and even went to the 2006 Vancouver conference, so I m glad
          Message 4 of 6 , May 1, 2008
            Thanks very much, Neil. Your solution will take me much closer to my
            goal.

            I've used Drupal before and even went to the 2006 Vancouver
            conference, so I'm glad to hear plugins are still seeing explosive
            growth for that great CMS.

            ~Tom
          • Neil Drumm
            ... I did not use slides. The presentation was: * Brief tour of maplight.org * Brief tour of themiddleclass.org * Explanation of where the data comes from,
            Message 5 of 6 , May 5, 2008
              On Thu, May 1, 2008 at 11:04 AM, Josh Tauberer <tauberer@...> wrote:
              >
              > Neil Drumm wrote:
              > > I am doing a presentation on this tomorrow in Sunnyvale, CA at the
              > > News Tools Drupal day,
              > > http://www.mediagiraffe.org/wiki/index.php/Jtm-sv-drupal.
              >
              > If you have slides, please post them here!

              I did not use slides. The presentation was:
              * Brief tour of maplight.org
              * Brief tour of themiddleclass.org
              * Explanation of where the data comes from, Thomas, GovTrack, and SunLight API
              * Demonstration of example code at
              http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/legislature/examples/
              ** Example 1 is a table of recent roll votes by Clinton, Obama, and
              McCain. They have been missing a lot of votes.
              ** Example 2 changes
              <a title="House vote #233">House vote</a>
              into
              <a title="House vote #233"
              href="http://www.govtrack.us/congress/vote.xpd?vote=h2008-233">House
              vote</a> (passed 247 to 165)
              using Drupal's filter API, which covers all posting/blog/article/node text.

              --
              Neil Drumm
              http://delocalizedham.com
            Your message has been successfully submitted and would be delivered to recipients shortly.