Loading ...
Sorry, an error occurred while loading the content.

Re: Initial source code release

Expand Messages
  • damianmont
    ... Thank you Josh as always. Any way you could just document what each file does? Here s list: database.people.sql database.tables2.sql database.tables.sql
    Message 1 of 8 , Jul 5, 2007
    • 0 Attachment
      --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
      > The repository is browsable here:
      > http://razor.occams.info/code/repo/?/govtrack/gather/us

      Thank you Josh as always.

      Any way you could just document what each file does?

      Here's list:
      database.people.sql
      database.tables2.sql
      database.tables.sql
      db.pl
      general.pl
      indexing.pl
      parse_record.pl
      parse_rollcall.pl
      parse_status.pl
      personaldb.pl
      sql.pl
      util.pl

      (p.s. EXCELLENT article on XML.com... That's how I found your site)

      Also could you maybe send us more or less the pages you scrape?
      I'll probably get them from reading the script, but I'm a php guy, not
      perl...but a programmer is a programmer right? I should be able to
      figure it out.
    • Joshua Tauberer
      ... Sure. ... MySQL table schema and data for people table of all people that have served in Congress (name, birthday, etc.), and people_roles table for
      Message 2 of 8 , Jul 5, 2007
      • 0 Attachment
        --- In govtrack@yahoogroups.com, "damianmont" <photoca@...> wrote:
        > Any way you could just document what each file does?
        >
        > Here's list:

        Sure.

        > database.people.sql
        MySQL table schema and data for "people" table of all people that have
        served in Congress (name, birthday, etc.), and "people_roles" table
        for every role in Congress each person has served (role type
        (senator/representative), start/end date, party, etc.).

        > database.tables2.sql
        > database.tables.sql
        MySQL table schemas for other tables that are filled in by the
        scripts, mostly for indexing bills. The people tables are the only
        ones I edit by hand and are not automatically generated from some
        other source.

        You'll need to pipe these to mysql, or otherwise load them, for any of
        the scripts to work. (The indexing tables aren't strictly necessary if
        you disable the indexing routines one way or another, but the people
        tables are pretty critical for all of the parsing scripts.)

        > db.pl
        Utility script for opening the MySQL database.

        > general.pl
        Really old utility functions that I don't really use.

        > indexing.pl
        Subroutines that update the indexing MySQL tables based on the
        contents of a bill or vote file.

        > parse_record.pl
        Parses the Congressional Record from THOMAS.

        > parse_rollcall.pl
        Parses roll call pages from the House and Senate websites.

        > parse_status.pl
        Parses bill status pages from THOMAS.

        > personaldb.pl
        Converts a name of a representative into an ID. Considers a date, role
        type, and state/district info to disambiguate names when it's ambiguous.

        > sql.pl
        Utility functions for dealing with MySQL (preparing SQL statements
        programmatically).

        > util.pl
        A ton of utility functions used throughout.

        > (p.s. EXCELLENT article on XML.com... That's how I found your site)

        Thanks!

        > Also could you maybe send us more or less the pages you scrape?
        > I'll probably get them from reading the script, but I'm a php guy, not
        > perl...but a programmer is a programmer right? I should be able to
        > figure it out.

        That's a long list. Maybe another time!

        - Josh
      • tay199
        YEAH! Thanks Josh. I ll keep the group posted on any patches or things of value we find and can contribute. Taylor
        Message 3 of 8 , Jul 6, 2007
        • 0 Attachment
          YEAH! Thanks Josh. I'll keep the group posted on any patches or things
          of value we find and can contribute.

          Taylor
        • Kevin Henry
          Great, thanks Josh... I didn t do much scripting (proper) for http://www.whereabill.org/, but since Josh has started the ball rolling, I ll add in the one
          Message 4 of 8 , Jul 8, 2007
          • 0 Attachment
            Great, thanks Josh...

            I didn't do much scripting (proper) for http://www.whereabill.org/,
            but since Josh has started the ball rolling, I'll add in the one
            script I did write: an XSLT file that takes the XML version of Josh's
            people database
            (http://www.govtrack.us/data/us/110/repstats/people.xml) and extracts
            the people/roles relevant to a single specified Congressional session.

            The problem (for my purposes, which include sending this information
            to the browser client on each request) is that the people.xml file is
            large (6.3MB), and includes lots of dead people. :) So I use this
            script to get only the information for a particular Congress.

            Kevin


            bioseparate.xml:

            <?xml version="1.0" encoding="UTF-8"?>
            <xsl:stylesheet version='1.0'
            xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
            xmlns:exsl="http://exslt.org/common">
            <xsl:output method="xml" version="1.0" encoding="ISO-8859-1"
            indent="yes"/>

            <xsl:param name="congress">110</xsl:param>

            <xsl:template match="people">
            <xsl:copy>
            <xsl:variable name="striprole">
            <xsl:for-each select="person">
            <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:copy-of select="role[(((number($congress)*2 + 1787) >=
            number(substring-before(@startdate,'-'))) and ((number($congress)*2 +
            1787) <= number(substring-before(@enddate,'-')))) or
            (((number($congress)*2 + 1788) >=
            number(substring-before(@startdate,'-'))) and ((number($congress)*2 +
            1788) <= number(substring-before(@enddate,'-'))))]"/>
            </xsl:copy>
            </xsl:for-each>
            </xsl:variable>
            <xsl:for-each select="exsl:node-set($striprole)/person[role]">
            <xsl:copy-of select="."/>
            </xsl:for-each>
            </xsl:copy>
            </xsl:template>

            </xsl:stylesheet>
          • damianmont
            Kevin, Love that www.WhereaBill.org site. You use the information from josh s xml files? Love the mashup, very well done.
            Message 5 of 8 , Jul 10, 2007
            • 0 Attachment
              Kevin,

              Love that www.WhereaBill.org site.
              You use the information from josh's xml files?

              Love the mashup, very well done.

              --- In govtrack@yahoogroups.com, "Kevin Henry" <k@...> wrote:
              >
              > Great, thanks Josh...
              >
              > I didn't do much scripting (proper) for http://www.whereabill.org ,
              > but since Josh has started the ball rolling, I'll add in the one
              > script I did write: an XSLT file that takes the XML version of Josh's
              > people database
              > (http://www.govtrack.us/data/us/110/repstats/people.xml) and extracts
              > the people/roles relevant to a single specified Congressional session.
            • Peggy Garvin
              I d like to know, too. I am writing a brief article (right now, due tomorrow) about some of the new legislative info projects and want to mention whereabill as
              Message 6 of 8 , Jul 10, 2007
              • 0 Attachment
                I'd like to know, too. I am writing a brief article (right now, due tomorrow) about some of the new legislative info projects and want to mention whereabill as well as a sample of sites that have used Govtrack's file.

                Thanks,
                Peggy Garvin
                peggy -at- garvinconsulting.com


                damianmont <photoca@...> wrote:
                Kevin,

                Love that www.WhereaBill. org site.
                You use the information from josh's xml files?

                Love the mashup, very well done.

                --- In govtrack@yahoogroup s.com, "Kevin Henry" <k@...> wrote:
                >
                > Great, thanks Josh...
                >
                > I didn't do much scripting (proper) for http://www.whereabi ll.org ,
                > but since Josh has started the ball rolling, I'll add in the one
                > script I did write: an XSLT file that takes the XML version of Josh's
                > people database
                > (http://www.govtrack .us/data/ us/110/repstats/ people.xml) and extracts
                > the people/roles relevant to a single specified Congressional session.


              • Kevin Henry
                Peggy and Damian, Thanks, glad you like the site. I m getting all the data from govtrack. Specifically, I m using the following files: - the bill status data
                Message 7 of 8 , Jul 11, 2007
                • 0 Attachment
                  Peggy and Damian,

                  Thanks, glad you like the site.

                  I'm getting all the data from govtrack. Specifically, I'm using the
                  following files:

                  - the bill status data (www.govtrack.us/data/us/*/bills/*.xml)
                  - the roll vote data (/data/us/*/rolls/*.xml)
                  - the people database (/data/us/110/repstats/people.xml)
                  - the popularity listing (/data/us/bills.technorati.xml)
                  - the search service (/congress/billsearch_api.xpd)

                  I keep a copy of all the files on my server, and do a daily rsync (as
                  Josh describes here: http://www.govtrack.us/source.xpd) to stay current.

                  Basically, when the server gets a request for a certain bill, it
                  retrieves the status data and goes through the action items, parsing
                  them into the steps that will be represented in the "driving
                  directions". It also retrieves any relevant roll vote data, and then
                  sends that information (along with the summary, the titles, the list
                  of sponsors, and the biographical data for that session of Congress)
                  back to the client, which renders everything (with the help of the
                  Google Maps API).

                  So it's really a (sort-of) UI sitting on top of govtrack's (sort-of) API.

                  Let me know if you need any more information...


                  Regards,
                  Kevin
                  http://www.whereabill.org/
                Your message has been successfully submitted and would be delivered to recipients shortly.