Loading ...
Sorry, an error occurred while loading the content.

dataset for data mining

Expand Messages
  • storiesbythefire
    Hi, Has anyone on the list created a nice script for capturing and converting the govtrack xml into datasets for data mining? Or are such data sets available
    Message 1 of 6 , Apr 30, 2008
    • 0 Attachment
      Hi,

      Has anyone on the list created a nice script for capturing and
      converting the govtrack xml into datasets for data mining? Or are
      such data sets available for research use?

      I'd simply like to find the quickest way to capture simple binary (up
      or down) info for each Congress person on each matter brought to a
      roll call vote in the most recent session (or two) of the House, to do
      some simple clustering analysis.

      Please let me know if you can help with a lead or a contact--or
      better, a dataset or script...

      Here's one example of a similar recent article that used such data:
      http://www.jstage.jst.go.jp/article/dsj/6/0/6_46/_article

      Thanks,

      Tom
      tec@...
    • Neil Drumm
      ... I built the legislature module, http://drupal.org/project/legislature, for Drupal. I would not say it is particularly nice quite yet, but it does import
      Message 2 of 6 , May 1, 2008
      • 0 Attachment
        On Wed, Apr 30, 2008 at 10:46 PM, storiesbythefire <tomdeme@...> wrote:
        >
        > Has anyone on the list created a nice script for capturing and
        > converting the govtrack xml into datasets for data mining? Or are
        > such data sets available for research use?

        I built the legislature module, http://drupal.org/project/legislature,
        for Drupal. I would not say it is particularly nice quite yet, but it
        does import politicians, bills, and votes into a MySQL (theoretically
        Postgres too) database. I built this code for a couple client
        projects, so I have not spent too much time polishing for end-users.
        It might be a bit overkill for your task since you end up with a whole
        CMS installed too.

        The general process is
        1. Install Drupal 5.
        2. Install Import manager and Job queue modules.
        3. Install Legislature module.
        4. Enable desired sessions, 108-110 have been tested. Visit the Menu
        admin page to clear some caches.
        5. Import politicians. You may need to raise PHP memory or execution
        time limits, I generally run imports from the shell, which has a
        separate php.ini, using drupal.sh from Drupal 6.
        6. Import bills.
        7. Run Drupal's cron script until all queued jobs are done.
        8. Write and run queries against your local database.

        I am doing a presentation on this tomorrow in Sunnyvale, CA at the
        News Tools Drupal day,
        http://www.mediagiraffe.org/wiki/index.php/Jtm-sv-drupal.

        --
        Neil Drumm
        http://delocalizedham.com
      • Josh Tauberer
        ... If you have slides, please post them here! -- - Josh Tauberer - GovTrack.us http://razor.occams.info Yields falsehood when preceded by its quotation!
        Message 3 of 6 , May 1, 2008
        • 0 Attachment
          Neil Drumm wrote:
          > I am doing a presentation on this tomorrow in Sunnyvale, CA at the
          > News Tools Drupal day,
          > http://www.mediagiraffe.org/wiki/index.php/Jtm-sv-drupal.

          If you have slides, please post them here!

          --
          - Josh Tauberer
          - GovTrack.us

          http://razor.occams.info

          "Yields falsehood when preceded by its quotation! Yields
          falsehood when preceded by its quotation!" Achilles to
          Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
        • Neil Drumm
          ... I will have some sample code once I write it. Let me know if you have any quick ideas for integrating GovTrack data with journalism. My current plan is: -
          Message 4 of 6 , May 1, 2008
          • 0 Attachment
            On Thu, May 1, 2008 at 11:04 AM, Josh Tauberer <tauberer@...> wrote:
            >
            > Neil Drumm wrote:
            > > I am doing a presentation on this tomorrow in Sunnyvale, CA at the
            > > News Tools Drupal day,
            > > http://www.mediagiraffe.org/wiki/index.php/Jtm-sv-drupal.
            >
            > If you have slides, please post them here!

            I will have some sample code once I write it. Let me know if you have
            any quick ideas for integrating GovTrack data with journalism. My
            current plan is:
            - A couple straightforward tables of votes to cover the basics.
            - Text filter for something like '<a title="S.Roll 123">the bill
            passed</a>' to '<a href="[url]" title="S.Roll 123">the bill passed
            [small sparklike-like graph]</a>'.

            --
            Neil Drumm
            http://delocalizedham.com
          • storiesbythefire
            Thanks very much, Neil. Your solution will take me much closer to my goal. I ve used Drupal before and even went to the 2006 Vancouver conference, so I m glad
            Message 5 of 6 , May 1, 2008
            • 0 Attachment
              Thanks very much, Neil. Your solution will take me much closer to my
              goal.

              I've used Drupal before and even went to the 2006 Vancouver
              conference, so I'm glad to hear plugins are still seeing explosive
              growth for that great CMS.

              ~Tom
            • Neil Drumm
              ... I did not use slides. The presentation was: * Brief tour of maplight.org * Brief tour of themiddleclass.org * Explanation of where the data comes from,
              Message 6 of 6 , May 5, 2008
              • 0 Attachment
                On Thu, May 1, 2008 at 11:04 AM, Josh Tauberer <tauberer@...> wrote:
                >
                > Neil Drumm wrote:
                > > I am doing a presentation on this tomorrow in Sunnyvale, CA at the
                > > News Tools Drupal day,
                > > http://www.mediagiraffe.org/wiki/index.php/Jtm-sv-drupal.
                >
                > If you have slides, please post them here!

                I did not use slides. The presentation was:
                * Brief tour of maplight.org
                * Brief tour of themiddleclass.org
                * Explanation of where the data comes from, Thomas, GovTrack, and SunLight API
                * Demonstration of example code at
                http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/legislature/examples/
                ** Example 1 is a table of recent roll votes by Clinton, Obama, and
                McCain. They have been missing a lot of votes.
                ** Example 2 changes
                <a title="House vote #233">House vote</a>
                into
                <a title="House vote #233"
                href="http://www.govtrack.us/congress/vote.xpd?vote=h2008-233">House
                vote</a> (passed 247 to 165)
                using Drupal's filter API, which covers all posting/blog/article/node text.

                --
                Neil Drumm
                http://delocalizedham.com
              Your message has been successfully submitted and would be delivered to recipients shortly.