Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] xml versions of some bills on Thomas

Expand Messages
  • Joshua Tauberer
    ... Neat. Bioguide was on my list of things to eventually scrape. Were you able to get everything out of bioguide? I d be interested in seeing all of that
    Message 1 of 9 , Feb 28, 2005
    • 0 Attachment
      Bill Farrell wrote:
      > A couple of weeks ago I discovered the Thomas trove and set about combining several data sources into a comprehensive data store on my Pythia site. I had to scrape Thomas and BioGuide to come up with a complete-ish dataset since apparently none of the information is in any ONE place :-P~~~

      Neat. Bioguide was on my list of things to eventually scrape. Were you
      able to get everything out of bioguide? I'd be interested in seeing
      all of that data.

      I'm in the process of setting up all of my data for GovTrack in RDF.
      Right now my server is trying to get it all (roughly 3 million
      'statements') into MySQL... it takes a while. But, the wonders of RDF
      don't begin until someone else uses some of the same RDF vocabularies to
      describe other related info. If you're interested in working on
      exporting some of your information in RDF (even at the least the list of
      IDs that the House is using), let's talk more about that.

      > Being the nearly newest in the govtrack e-group, I'm not sure where everyone else is in their projects

      Scott was going to work on California historical election data (and
      possibly current legislative info). Have you gotten started on that, Scott?

      I don't know if there are any other parallel projects that got anywhere yet.

      > if any interchange agreements have been established.

      Not that I know of. ParticipatoryPolitics.org is actively working on a
      site that will use GovTrack's data, but of course anyone is welcome to
      do that.

      All of this discussion is really just beginning.

      > These are the things I'm her to learn about the group members: who in the group has what kinds of information, who's authoritative on a given data store, what makes them authoritative (by source or by agreement),

      These are good questions that don't have any answers for yet (in part
      because there is only a very small number of sources of data). I hope
      OGDEX.com (which isn't working for me at the moment) will become the
      place with the answers to those types of questions.

      > who can/does mirror components, how are updates published and applied (timeliness), when is a store considered stale, if it's stale should it offer a non-authoritative answer, or none?

      More good questions that need to be worked out. Something to think
      about is whether we need a new format to specify, for a data source, how
      to retreive its data, how often it's updated, who owns and creates the
      data, etc. Something that will make it easy to gather and mirror data
      from an array of sources.

      --
      - Joshua Tauberer

      http://taubz.for.net

      ** Nothing Unreal Exists **
    • Scott Beardsley
      ... I was originally working on parsing bill data into xml but the discovery of aroundthecapitol.com put the brakes on that work. I ve yet to hear back from
      Message 2 of 9 , Feb 28, 2005
      • 0 Attachment
        --- Joshua Tauberer <tauberer@...> wrote:

        > Scott was going to work on California historical
        > election data (and
        > possibly current legislative info). Have you gotten
        > started on that, Scott?

        I was originally working on parsing bill data into xml
        but the discovery of aroundthecapitol.com put the
        brakes on that work. I've yet to hear back from the
        sites creator (the other Scott) about licensing and
        other details so I might revisit this later if he is
        uninterested.

        I'm working on manually digitizing (ie Hard Copy ->
        Digital Photo -> Spreadsheet -> XML) California
        election data now. I've gathered the last 5 years of
        CA elections into a spreedsheet for each election
        (I've found normal spreadsheet apps to be much faster
        than going directly to xml). I'm almost done with a
        perl script to translate those spreadsheets into xml.
        I wanted to get a few years of data to fully
        understand what type of data I'm working with. I've
        found that I can finish a full election in about 20-30
        hours so I estimate all of CA's election data should
        take one person a year of full time data entry.

        It may be possible to use OCR software to automate
        some of this. I'm taking digital photo's for older
        elections. I'll send out a Flickr set link when I have
        it ready.

        > I don't know if there are any other parallel
        > projects that got anywhere yet.

        For California: aroundthecapitol.com but Scott (the
        other one) hasn't shown any interest in joining
        govtrack/ogdex yet.





        __________________________________
        Do you Yahoo!?
        Yahoo! Mail - Easier than ever with enhanced search. Learn more.
        http://info.mail.yahoo.com/mail_250
      • Scott Beardsley
        ... I m still not sure if there is an underlying source of the name-id but it seems they are becoming standardized on whatever the bioguide is using. Scott
        Message 3 of 9 , Mar 1 7:20 AM
        • 0 Attachment
          --- Scott Beardsley <sc0ttbeardsley@...> wrote:

          > I've got an email into
          > xml-bill-comments@... for more info about
          > name-id.

          I'm still not sure if there is an underlying source of
          the name-id but it seems they are becoming
          standardized on whatever the bioguide is using.

          Scott

          From "Carmel, Joe" <joe.carmel@...>:

          For the House, the name-id is the Member's id from
          http://bioguide.congress.gov This provides a unique
          identification for each Member of Congress for all
          time.

          The ids are unique and you should not assume anything
          about their numbering; if anything you should assume
          the numbering is random (although unique). Do not
          assume that a given name will begin with a specific
          letter because they don't.





          __________________________________
          Do you Yahoo!?
          Yahoo! Mail - You care about security. So do we.
          http://promotions.yahoo.com/new_mail
        Your message has been successfully submitted and would be delivered to recipients shortly.