Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] Coding projects

Expand Messages
  • Leonard Linde
    On earmarks: Earmarks are released in committee reports as poorly scanned embedded images. Each committee uses its own format for the scanned portion of the
    Message 1 of 6 , May 24, 2008
    • 0 Attachment
      On earmarks: Earmarks are released in committee reports as poorly scanned embedded images. Each committee uses its own format for the scanned portion of the report. It's a tough nut, on purpose, I assume.

      Taxpayers for Common Sense has last gathere year's earmarks in an excel spreadsheet. I've parsed that spreadsheet and associated the (text) names with the govtrack voter id. I can send you a SQL dump of the database if you're interested. TCS requires non-commercial use and attribution to use it.


      ----- Original Message ----
      From: Josh Tauberer <tauberer@...>
      To: GovTrack List <govtrack@yahoogroups.com>
      Sent: Friday, May 23, 2008 12:39:03 PM
      Subject: [govtrack] Coding projects


      Now that GovTrack is open source (http://www.govtrack .us/source. xpd) I
      am going to try to be a little more encouraging to get others involved
      in maintaining and improving the website.

      Here are some possible things that you might be interested in working on:

      Scraping more data- committee and conference reports (good for finding
      earmarks), news articles, scraping general committee information

      Parsing bills- finding earmarks, relating bills to laws being amended,
      comparing bills and tracking the evolution of bills better.

      Improving site features- new ways to search legislation (e.g. by
      sponsor), visualizations of legislation language and evolution, tagging
      bills, visualization of legislative statistics, generation of new
      statistics and graphs

      Extending the site to new areas- browsing the constitution, U.S. code,
      regulations, judicial documents, state-level legislation

      I'm more than happy to help you work on these things if you're interested.

      --
      - Josh Tauberer
      - GovTrack.us

      http://razor. occams.info

      "Yields falsehood when preceded by its quotation! Yields
      falsehood when preceded by its quotation!" Achilles to
      Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
    • Roger Williams
      Hello Leonard: I was looking at this spreadsheet. I am interested in how you got that data correlated with the govtrack voter id. Regards..RogerW ... scanned
      Message 2 of 6 , Jul 20, 2008
      • 0 Attachment
        Hello Leonard:

        I was looking at this spreadsheet. I am interested in how you got
        that data correlated with the govtrack voter id.

        Regards..RogerW
        --- In govtrack@yahoogroups.com, Leonard Linde <llinde@...> wrote:
        >
        > On earmarks: Earmarks are released in committee reports as poorly
        scanned embedded images. Each committee uses its own format for the
        scanned portion of the report. It's a tough nut, on purpose, I
        assume.
        >
        > Taxpayers for Common Sense has last gathere year's earmarks in an
        excel spreadsheet. I've parsed that spreadsheet and associated the
        (text) names with the govtrack voter id. I can send you a SQL dump
        of the database if you're interested. TCS requires non-commercial
        use and attribution to use it.
        >
        >
        > ----- Original Message ----
        > From: Josh Tauberer <tauberer@...>
        > To: GovTrack List <govtrack@yahoogroups.com>
        > Sent: Friday, May 23, 2008 12:39:03 PM
        > Subject: [govtrack] Coding projects
        >
        >
        > Now that GovTrack is open source (http://www.govtrack .us/source.
        xpd) I
        > am going to try to be a little more encouraging to get others
        involved
        > in maintaining and improving the website.
        >
        > Here are some possible things that you might be interested in
        working on:
        >
        > Scraping more data- committee and conference reports (good for
        finding
        > earmarks), news articles, scraping general committee information
        >
        > Parsing bills- finding earmarks, relating bills to laws being
        amended,
        > comparing bills and tracking the evolution of bills better.
        >
        > Improving site features- new ways to search legislation (e.g. by
        > sponsor), visualizations of legislation language and evolution,
        tagging
        > bills, visualization of legislative statistics, generation of new
        > statistics and graphs
        >
        > Extending the site to new areas- browsing the constitution, U.S.
        code,
        > regulations, judicial documents, state-level legislation
        >
        > I'm more than happy to help you work on these things if you're
        interested.
        >
        > --
        > - Josh Tauberer
        > - GovTrack.us
        >
        > http://razor. occams.info
        >
        > "Yields falsehood when preceded by its quotation! Yields
        > falsehood when preceded by its quotation!" Achilles to
        > Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
        >
      • Leonard Linde
        I exported it to csv and wrote a python (django framework) program to match, by name, to the govtrack.us people file, which I already have in a MySQL table. It
        Message 3 of 6 , Jul 20, 2008
        • 0 Attachment
          I exported it to csv and wrote a python (django framework) program to match, by name, to the govtrack.us people file, which I already have in a MySQL table.

          It was an iterative process because the excel file formatting is not 100%:  some names were misspelled, some punctuation was missing, etc.  IIRC, I had to edit one or two lines in the file.

          I did not treat the program as something I'd re-use, because I assumed that the Taxpayers for Common Sense file format will change, and even if it doesn't, next years' file will have a different set of formatting errors.

          I think my result at least 99% correct.   I'd be happy to export the resulting data in any format you'd like and send it to you, as long as you follow the attribution policy of TCS.   I'll send you the program, too, for what it's worth.

          --- On Sun, 7/20/08, Roger Williams <dbsearch04@...> wrote:
          From: Roger Williams <dbsearch04@...>
          Subject: [govtrack] Re: Coding projects
          To: govtrack@yahoogroups.com
          Date: Sunday, July 20, 2008, 7:32 PM

          Hello Leonard:

          I was looking at this spreadsheet. I am interested in how you got
          that data correlated with the govtrack voter id.

          Regards..RogerW
          --- In govtrack@yahoogroup s.com, Leonard Linde <llinde@...> wrote:
          >
          > On earmarks: Earmarks are released in committee reports as poorly
          scanned embedded images. Each committee uses its own format for the
          scanned portion of the report. It's a tough nut, on purpose, I
          assume.
          >
          > Taxpayers for Common Sense has last gathere year's earmarks in an
          excel spreadsheet. I've parsed that spreadsheet and associated the
          (text) names with the govtrack voter id. I can send you a SQL dump
          of the database if you're interested. TCS requires non-commercial
          use and attribution to use it.
          >
          >
          > ----- Original Message ----
          > From: Josh Tauberer <tauberer@.. .>
          > To: GovTrack List <govtrack@yahoogroup s.com>
          > Sent: Friday, May 23, 2008 12:39:03 PM
          > Subject: [govtrack] Coding projects
          >
          >
          > Now that GovTrack is open source (http://www.govtrack .us/source.
          xpd) I
          > am going to try to be a little more encouraging to get others
          involved
          > in maintaining and improving the website.
          >
          > Here are some possible things that you might be interested in
          working on:
          >
          > Scraping more data- committee and conference reports (good for
          finding
          > earmarks), news articles, scraping general committee information
          >
          > Parsing bills- finding earmarks, relating bills to laws being
          amended,
          > comparing bills and tracking the evolution of bills better.
          >
          > Improving site features- new ways to search legislation (e.g. by
          > sponsor), visualizations of legislation language and evolution,
          tagging
          > bills, visualization of legislative statistics, generation of new
          > statistics and graphs
          >
          > Extending the site to new areas- browsing the constitution, U.S.
          code,
          > regulations, judicial documents, state-level legislation
          >
          > I'm more than happy to help you work on these things if you're
          interested.
          >
          > --
          > - Josh Tauberer
          > - GovTrack.us
          >
          > http://razor. occams.info
          >
          > "Yields falsehood when preceded by its quotation! Yields
          > falsehood when preceded by its quotation!" Achilles to
          > Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
          >


        • Roger Williams
          ... to match, by name, to the govtrack.us people file, which I already have in a MySQL table. ... not 100%:  some names were misspelled, some punctuation was
          Message 4 of 6 , Jul 24, 2008
          • 0 Attachment
            --- In govtrack@yahoogroups.com, Leonard Linde <llinde@...> wrote:
            >
            > I exported it to csv and wrote a python (django framework) program
            to match, by name, to the govtrack.us people file, which I already
            have in a MySQL table.
            >
            > It was an iterative process because the excel file formatting is
            not 100%:  some names were misspelled, some punctuation was missing,
            etc.  IIRC, I had to edit one or two lines in the file.
            >
            > I did not treat the program as something I'd re-use, because I
            assumed that the Taxpayers for Common Sense file format will change,
            and even if it doesn't, next years' file will have a different set of
            formatting errors.
            >
            > I think my result at least 99% correct.   I'd be happy to export
            the resulting data in any format you'd like and send it to you, as
            long as you follow the attribution policy of TCS.   I'll send you the
            program, too, for what it's worth.
            >
            > <snipped/>
            Hello Leonard:

            I am interested in the .csv version of the data as well as the
            program. I think I saw somewhere that Josh wants to get stuff in perl
            [for consistency with the other parts of the GovTrack implmentation].
            So I can use the program as a guide.

            TIA.

            Regards..RogerW
          • Leonard Linde
            I just sent you a zip file with the data and program. ... From: Roger Williams Subject: [govtrack] Re: Coding projects To:
            Message 5 of 6 , Jul 24, 2008
            • 0 Attachment
              I just sent you a zip file with the data and program.

              --- On Thu, 7/24/08, Roger Williams <dbsearch04@...> wrote:
              From: Roger Williams <dbsearch04@...>
              Subject: [govtrack] Re: Coding projects
              To: govtrack@yahoogroups.com
              Date: Thursday, July 24, 2008, 11:30 AM

              --- In govtrack@yahoogroup s.com, Leonard Linde <llinde@...> wrote:
              >
              > I exported it to csv and wrote a python (django framework) program
              to match, by name, to the govtrack.us people file, which I already
              have in a MySQL table.
              >
              > It was an iterative process because the excel file formatting is
              not 100%:  some names were misspelled, some punctuation was missing,
              etc.  IIRC, I had to edit one or two lines in the file.
              >
              > I did not treat the program as something I'd re-use, because I
              assumed that the Taxpayers for Common Sense file format will change,
              and even if it doesn't, next years' file will have a different set of
              formatting errors.
              >
              > I think my result at least 99% correct.   I'd be happy to export
              the resulting data in any format you'd like and send it to you, as
              long as you follow the attribution policy of TCS.   I'll send you the
              program, too, for what it's worth.
              >
              > <snipped/>
              Hello Leonard:

              I am interested in the .csv version of the data as well as the
              program. I think I saw somewhere that Josh wants to get stuff in perl
              [for consistency with the other parts of the GovTrack implmentation] .
              So I can use the program as a guide.

              TIA.

              Regards..RogerW


            Your message has been successfully submitted and would be delivered to recipients shortly.