Loading ...
Sorry, an error occurred while loading the content.

xml versions of some bills on Thomas

Expand Messages
  • Neal McBurnett
    I found this an interesting read: Linking Architecture and the U.S. House of Representatives http://www.oreillynet.com/pub/wlg/4520 The Library of Congress
    Message 1 of 9 , Feb 25, 2005
    • 0 Attachment
      I found this an interesting read:

      Linking Architecture and the U.S. House of Representatives
      http://www.oreillynet.com/pub/wlg/4520

      The Library of Congress' Thomas web site (named for a former
      resident of the town where I live) is now making some new
      legislation available in XML. The XML points to XSLT stylesheets
      that format it for viewing....

      See also
      http://thomas.loc.gov/home/xml_help.html

      Check out, e.g.,

      http://thomas.loc.gov/home/gpoxmlc108/h3701_ih.xml

      for an xml version of a bill, displayed via a stylesheet.

      I don't see other links to them, or more recent bills - I hope they
      haven't given up on this.

      Clean XML versions of bills would make so many things easier....

      Cheers,

      Neal McBurnett http://bcn.boulder.co.us/~neal/
      Signed and/or sealed mail encouraged. GPG/PGP Keyid: 2C9EBA60
    • Scott Beardsley
      ... Interesting... It looks like they may have solved our little person unique id issue too. From another bill in the same directory (h3701_ih.xml):
      Message 2 of 9 , Feb 26, 2005
      • 0 Attachment
        --- Neal McBurnett <neal@...> wrote:

        > Check out, e.g.,
        >
        > http://thomas.loc.gov/home/gpoxmlc108/h3701_ih.xml
        >

        Interesting... It looks like they may have solved our
        little person unique id issue too. From another bill
        in the same directory (h3701_ih.xml):

        ...<sponsor name-id="D000597">Mrs. Jo Ann Davis of
        Virginia</sponsor> (for herself, <cosponsor
        name-id="S000522">Mr. Smith of New
        Jersey</cosponsor>...

        Notice the attribute 'name-id'. The dtd doesn't give
        to much info about how that is generated:

        name-id The identification number of an individual

        I've got an email into
        xml-bill-comments@... for more info about
        name-id.

        Also, it looks like this has been in the works for
        quite some time. From bill.dtd:

        INITIAL DRAFT VERSION ... v1.0 19980505

        with changes as recent as: December 27, 2004

        I wonder if there are any LoC lurkers that want to
        spill the beans about the status/scope of this
        project?

        Scott



        __________________________________
        Do you Yahoo!?
        Yahoo! Sports - Sign up for Fantasy Baseball.
        http://baseball.fantasysports.yahoo.com/
      • Joshua Tauberer
        ... Hey, Neal. Yeah, it s pretty great that the House has taken up the idea of XML. It s too bad the Senate hasn t yet, though. ... As far as I understand it,
        Message 3 of 9 , Feb 27, 2005
        • 0 Attachment
          Neal McBurnett wrote:
          > Linking Architecture and the U.S. House of Representatives
          > http://www.oreillynet.com/pub/wlg/4520

          Hey, Neal.

          Yeah, it's pretty great that the House has taken up the idea of XML.
          It's too bad the Senate hasn't yet, though.

          > I don't see other links to them, or more recent bills - I hope they
          > haven't given up on this.

          As far as I understand it, all (most?) new legislation in the House is
          still being written in XML. More at http://xml.house.gov/.

          > Clean XML versions of bills would make so many things easier....

          I personally don't think it helps *that* much. You don't really get
          much more than paragraph boundaries. It's better than having it only in
          PDF format (and text derived from that), of course.

          Scott Beardsley wrote:
          > Interesting... It looks like they may have solved our
          > little person unique id issue too. From another bill
          > in the same directory (h3701_ih.xml):
          >
          > ...<sponsor name-id="D000597">Mrs. Jo Ann Davis of
          > Virginia</sponsor> (for herself, <cosponsor
          > name-id="S000522">Mr. Smith of New
          > Jersey</cosponsor>...

          Hi, Scott. Not quite solved. It doesn't help when, for instance, we
          want to integrate state-level information with state-level politicians
          that aren't in the House's database of people. However, that is a key
          component of having machine-readable data published by Congress, except
          that there's no public list of name-id's right now.

          These are also the same IDs that you'll find if you look at the source
          of the roll call vote records on the house website, which are
          underlyingly XML (though the output you see might depend on your
          browser, supposedly).

          > I wonder if there are any LoC lurkers that want to
          > spill the beans about the status/scope of this
          > project?

          I've been in touch with someone that knows the people that are directly
          involved with this (I don't think he's on this list...), and through him
          I'm going to get to meet next week with some of the techies that are
          involved in how the House publishes data on their website. Maybe some
          will join this mail list. I'll let you all know how that goes.

          Keep the conversation going.

          --
          - Joshua Tauberer

          http://taubz.for.net

          ** Nothing Unreal Exists **
        • Bill Farrell
          Hi, I m still catching up to the thread and wasn t aware that y all weren t aware of Thomas s person-id fix. A couple of weeks ago I discovered the Thomas
          Message 4 of 9 , Feb 27, 2005
          • 0 Attachment
            Hi,

            I'm still catching up to the thread and wasn't aware that y'all weren't aware of Thomas's person-id fix. A couple of weeks ago I discovered the Thomas trove and set about combining several data sources into a comprehensive data store on my Pythia site. I had to scrape Thomas and BioGuide to come up with a complete-ish dataset since apparently none of the information is in any ONE place :-P~~~

            The up-side is that I have ID's that tie all the way through the data stores, but still the individual information is somewhat incomplete. That is, I don't have ALL the web sites and ALL the email addresses/form names, but I do have most of everything else for 107th-109th Congress.

            If it's of any use, there is an experimental query by the new ID at
            http://pythia.progressivenation.net/modules/tinycontent3/index.php?id=14

            My schema is at
            http://pythia.progressivenation.net/dtd/congress.xsd and the schema doc is at
            http://pythia.progressivenation.net/dtd/congress_doc.html

            Note: using the first query will deliver an internal DTD with the document. I didn't add any other CONGRESS file queries (though it's NBD to throw some "by state", "by district", etc queries to the site). This first pass was more to help me catch up to the rest of this group on data stores.

            This particular schema matches the natural internal representation of a record in my database (nested post-relational/multidimensional). While that may seem odd to people stil using traditional NF1 relational databases (would require edge-tables) it does allow me to keep and deliver a legislator's information within one record unit, retrievable with a single query.

            Of course, anything I have or am developing is open for use to the group. If you find any of it useful, e me and I'll gladly ship you what I've got. If you have suggestions or would like more information, let's put our heads together.

            Best regards,
            Bill

            ----- Original Message -----
            From: Scott Beardsley <sc0ttbeardsley@...>
            To: govtrack@yahoogroups.com
            Sent: Sun, 27 Feb 2005 02:49:55 +0000
            Subject: Re: [govtrack] xml versions of some bills on Thomas


            >
          • Scott Beardsley
            ... Excellent! Yeah I think kicking these ideas around for a bit will help everyone understand. I ll poke around a bit more on your site. ... Hmmm ya you re
            Message 5 of 9 , Feb 27, 2005
            • 0 Attachment
              --- Bill Farrell <jwwf@...> wrote:

              > Of course, anything I have or am developing is open
              > for use to the group. ... If you
              > have suggestions or would like more information,
              > let's put our heads together.

              Excellent! Yeah I think kicking these ideas around for
              a bit will help everyone understand. I'll poke around
              a bit more on your site.

              > It doesn't help when, for instance, we
              > want to integrate state-level information with
              > state-level politicians that aren't in the House's
              > database of people.

              Hmmm ya you're probably right. I think this is just
              pointing to an id map for the different levels of govt
              or wherever there is a different id method (ie if
              each session causes an id reset). But, this assumes
              all levels have assigned unique ids (imagine the
              organization level/tech ability of *your* county
              govt). sheesh. This problem only gets worse when
              integrating past data and data before the info age.

              So, how do politicians become official? Do they have
              to file some document with the state/county/feds when
              they decide to run? Bah, that's probably different in
              every level too. Talk about requiring a driver's
              license to vote... how about one to run? heh I'm just
              thinking out loud now.

              Scott




              __________________________________
              Do you Yahoo!?
              Yahoo! Mail - now with 250MB free storage. Learn more.
              http://info.mail.yahoo.com/mail_250
            • Bill Farrell
              Hey Scott, I run into a lot of the same problems doing genealogy -- there s a *LOT* of handscrawl that comes out of the pre-information age. Shoot, looking
              Message 6 of 9 , Feb 27, 2005
              • 0 Attachment
                Hey Scott,

                I run into a lot of the same problems doing genealogy -- there's a *LOT* of handscrawl that comes out of the pre-information age. Shoot, looking around the net, there isn't a lot better at the moment :-) But one of the things I do to put dindin on the table is to take a load of scrawled "just-stuff", lay it out, pick it apart, find the relationships and make a retrievable database out of it.

                It's also clear that so far HAVA is a thoroughgoing mess with haphazard implementations patchworked across the country. The largest part of the problem is that while Congress mandated uniform voting procedures, the funding that was supposed to enable that to happen never appeared (can we say, "No child left behind"??) At the local level we up against a lot of "this is the way we've always done it".

                I'm in contact with some precinct captains, BOE chairpersons and volunteers -- the one theme that resounds nationally is that there are no standards for anything. The way the current HAVA law is laid out, there aren't likely to be any, any time soon. With this in mind, there are a lot of grass-roots efforts where local boards are comparing notes and procedures with other boards to try and create some.

                Even at that, there will be a long time before much of anything below state level is either accessible or uniform :-( This is where we have an opportunity to support the xml.house.gov technical group, to meet with and gather ideas from our local BOE's, then to work out the best sorts of interchange amongst the entirety.

                Being the nearly newest in the govtrack e-group, I'm not sure where everyone else is in their projects, what kinds of data we're scraping/normalizing/publishing, what we have on hand because someone sat down and keyed it in, or if any interchange agreements have been established. I don't know how much I don't know yet :LOL:

                A little about why I'm excited about this particular project. Before it was "IT" and somewhere after the Dark Ages (DP or Data Processing) there was MIS (Management Information Systems). That's where I grew up. That was the Huge Room Where It All Came Together. MIS had consolidated and normalized all your company's disparate "stuff" together where anyone could draw out the appropriate "stuff", mined in an appropriate way. Enter Steve Jobs and Bill Gates. MIS was slowly dismantled, the larger jobs were "outsourced" and the smaller jobs scattered to desktops. In the 80's outsourcing craze, many companies lost the ability to integrate and reanalyze its stores in flexible ways. But everyone had a PC on their desk, b'damn, even if they had no idea what to do with it.

                A generation has forgotten how to consolidate and reanalyze; our government's IT is clearly in a mess; "public information" is nearly an oxymoron; and we're repeating history by moving back from desktop DP to management information. The system is a mess, but I can see on this list a way to Do Something Substantial.

                Much of the work I did in the late 70's and early 80's was to catch a chunk of data that pooted out of the end of one process, then figure out how to make it palatable to another process that could use pieces of the first to flesh-out or add new ways of looking at its own stores. To me, finding, scraping, normalizing and integrating the raw sources are the child's play part. Publishing/Interchange comes a bit more tricky.

                These are the things I'm her to learn about the group members: who in the group has what kinds of information, who's authoritative on a given data store, what makes them authoritative (by source or by agreement), who can/does mirror components, how are updates published and applied (timeliness), when is a store considered stale, if it's stale should it offer a non-authoritative answer, or none? In return, what may I offer that's of value in moving forward?

                Best!
                Bill

                ----- Original Message -----
                From: Scott Beardsley <sc0ttbeardsley@...>
                To: govtrack@yahoogroups.com
                Sent: Sun, 27 Feb 2005 17:20:04 +0000
                Subject: Re: [govtrack] xml versions of some bills on Thomas


                >
              • Joshua Tauberer
                ... Neat. Bioguide was on my list of things to eventually scrape. Were you able to get everything out of bioguide? I d be interested in seeing all of that
                Message 7 of 9 , Feb 28, 2005
                • 0 Attachment
                  Bill Farrell wrote:
                  > A couple of weeks ago I discovered the Thomas trove and set about combining several data sources into a comprehensive data store on my Pythia site. I had to scrape Thomas and BioGuide to come up with a complete-ish dataset since apparently none of the information is in any ONE place :-P~~~

                  Neat. Bioguide was on my list of things to eventually scrape. Were you
                  able to get everything out of bioguide? I'd be interested in seeing
                  all of that data.

                  I'm in the process of setting up all of my data for GovTrack in RDF.
                  Right now my server is trying to get it all (roughly 3 million
                  'statements') into MySQL... it takes a while. But, the wonders of RDF
                  don't begin until someone else uses some of the same RDF vocabularies to
                  describe other related info. If you're interested in working on
                  exporting some of your information in RDF (even at the least the list of
                  IDs that the House is using), let's talk more about that.

                  > Being the nearly newest in the govtrack e-group, I'm not sure where everyone else is in their projects

                  Scott was going to work on California historical election data (and
                  possibly current legislative info). Have you gotten started on that, Scott?

                  I don't know if there are any other parallel projects that got anywhere yet.

                  > if any interchange agreements have been established.

                  Not that I know of. ParticipatoryPolitics.org is actively working on a
                  site that will use GovTrack's data, but of course anyone is welcome to
                  do that.

                  All of this discussion is really just beginning.

                  > These are the things I'm her to learn about the group members: who in the group has what kinds of information, who's authoritative on a given data store, what makes them authoritative (by source or by agreement),

                  These are good questions that don't have any answers for yet (in part
                  because there is only a very small number of sources of data). I hope
                  OGDEX.com (which isn't working for me at the moment) will become the
                  place with the answers to those types of questions.

                  > who can/does mirror components, how are updates published and applied (timeliness), when is a store considered stale, if it's stale should it offer a non-authoritative answer, or none?

                  More good questions that need to be worked out. Something to think
                  about is whether we need a new format to specify, for a data source, how
                  to retreive its data, how often it's updated, who owns and creates the
                  data, etc. Something that will make it easy to gather and mirror data
                  from an array of sources.

                  --
                  - Joshua Tauberer

                  http://taubz.for.net

                  ** Nothing Unreal Exists **
                • Scott Beardsley
                  ... I was originally working on parsing bill data into xml but the discovery of aroundthecapitol.com put the brakes on that work. I ve yet to hear back from
                  Message 8 of 9 , Feb 28, 2005
                  • 0 Attachment
                    --- Joshua Tauberer <tauberer@...> wrote:

                    > Scott was going to work on California historical
                    > election data (and
                    > possibly current legislative info). Have you gotten
                    > started on that, Scott?

                    I was originally working on parsing bill data into xml
                    but the discovery of aroundthecapitol.com put the
                    brakes on that work. I've yet to hear back from the
                    sites creator (the other Scott) about licensing and
                    other details so I might revisit this later if he is
                    uninterested.

                    I'm working on manually digitizing (ie Hard Copy ->
                    Digital Photo -> Spreadsheet -> XML) California
                    election data now. I've gathered the last 5 years of
                    CA elections into a spreedsheet for each election
                    (I've found normal spreadsheet apps to be much faster
                    than going directly to xml). I'm almost done with a
                    perl script to translate those spreadsheets into xml.
                    I wanted to get a few years of data to fully
                    understand what type of data I'm working with. I've
                    found that I can finish a full election in about 20-30
                    hours so I estimate all of CA's election data should
                    take one person a year of full time data entry.

                    It may be possible to use OCR software to automate
                    some of this. I'm taking digital photo's for older
                    elections. I'll send out a Flickr set link when I have
                    it ready.

                    > I don't know if there are any other parallel
                    > projects that got anywhere yet.

                    For California: aroundthecapitol.com but Scott (the
                    other one) hasn't shown any interest in joining
                    govtrack/ogdex yet.





                    __________________________________
                    Do you Yahoo!?
                    Yahoo! Mail - Easier than ever with enhanced search. Learn more.
                    http://info.mail.yahoo.com/mail_250
                  • Scott Beardsley
                    ... I m still not sure if there is an underlying source of the name-id but it seems they are becoming standardized on whatever the bioguide is using. Scott
                    Message 9 of 9 , Mar 1, 2005
                    • 0 Attachment
                      --- Scott Beardsley <sc0ttbeardsley@...> wrote:

                      > I've got an email into
                      > xml-bill-comments@... for more info about
                      > name-id.

                      I'm still not sure if there is an underlying source of
                      the name-id but it seems they are becoming
                      standardized on whatever the bioguide is using.

                      Scott

                      From "Carmel, Joe" <joe.carmel@...>:

                      For the House, the name-id is the Member's id from
                      http://bioguide.congress.gov This provides a unique
                      identification for each Member of Congress for all
                      time.

                      The ids are unique and you should not assume anything
                      about their numbering; if anything you should assume
                      the numbering is random (although unique). Do not
                      assume that a given name will begin with a specific
                      letter because they don't.





                      __________________________________
                      Do you Yahoo!?
                      Yahoo! Mail - You care about security. So do we.
                      http://promotions.yahoo.com/new_mail
                    Your message has been successfully submitted and would be delivered to recipients shortly.