Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] Re: linking Govtrack data

Expand Messages
  • Josh Tauberer
    ... That s a really good example. (I hadn t read your follow-up until after sending my last email.) Doing some reverse engineering the query is: PREFIX db:
    Message 1 of 8 , Jul 7, 2008
    • 0 Attachment
      dbsearch04 wrote:
      > DBPedia has this example query [in English]:
      >
      > "Mayors of US cities higher than 1000m"
      >
      > stored here: http://wikipedia.aksw.org/index.php?qid=9
      >
      > I don't know how to display the raw SPARQL for this. But I might want
      > to "join" this query with:
      >
      > "and where earmarks were proposed in the current Congress".
      >
      > I saw that earmarks are not yet in GovTrack, but this is an example
      > query I could dream up to "join" the disparate data source.

      That's a really good example. (I hadn't read your follow-up until after
      sending my last email.)

      Doing some reverse engineering the query is:

      PREFIX db: <http://dbpedia.org/property/>
      SELECT * WHERE {
      ?city db:leaderName ?leader ;
      db:subdivisionName ?subdiv ;
      db:elevation ?elevation .
      }

      When I put <http://dbpedia.org/resource/United_States> for ?subdiv I get
      no results. It might be that the query page you linked to is working off
      of a different version of the dataset than what is served by
      http://DBpedia.org/sparql.

      To try it, you could do:

      wget -O - http://DBpedia.org/sparql?query="PREFIX db:
      <http://dbpedia.org/property/>
      SELECT * WHERE {
      ?city db:leaderName ?leader ;
      db:subdivisionName ?subdiv ;
      db:elevation ?elevation .
      } LIMIT 5"

      all on one line.

      Earmark data is very rough around the edges, but there is a lot of it
      these days thanks to the work of Taxpayers for Common Sense, and
      Sunlight, and I suspect that it would be possible to get it into a form
      where a query like the one you suggested would be possible.

      As with any of the project ideas that come up on this list, if you're
      interested in hacking on it, please let me know 1) so I can be
      encouraging, and 2) so I can let you know if external funding to work on
      the project might be available.

      --
      - Josh Tauberer
      - GovTrack.us

      http://razor.occams.info

      "Yields falsehood when preceded by its quotation! Yields
      falsehood when preceded by its quotation!" Achilles to
      Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
    • Roger Williams
      ... to the ... my ... extent ... all. No ... Yeah...I looked at this DBPedia page before. What I got confused about is that it has a bubble for US Census
      Message 2 of 8 , Jul 8, 2008
      • 0 Attachment
        --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
        > The data is marginally interlinked. For instance, their New York
        resource:
        > http://dbpedia.org/resource/New_York
        > is owl:sameAs'd to my New York resource:
        > http://www.rdfabout.com/rdf/usgov/geo/us/ny
        >
        > This was done based on their previously existing owl:sameAs links
        to the
        > Geonames dataset, and then the links I created between Geonames and
        my
        > Census data set (http://www.rdfabout.com/demo/census). That's the
        extent
        > of the interlinking --- otherwise the datasets don't connect at
        all. No
        > politician interlinking, for instance. (Also see:
        > http://wiki.dbpedia.org/Interlinking)
        >
        Yeah...I looked at this DBPedia page before. What I got confused
        about is that it has a bubble for "US Census Data" and
        then "GovTrack" is grey-ed. There is no legend for this graphic.

        Do you know what the "grey-ed" bubbles connote? Is this like more
        than one "degree of separation"?

        Regards..RogerW
      • Roger Williams
        ... ... does it. Hey Josh: I found out about GovTrack while looking into SemWeb. All of the info you provided above is very
        Message 3 of 8 , Jul 8, 2008
        • 0 Attachment
          --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
          >
          <snipped/>
          > My SemWeb library (http://razor.occams.info/code/semweb) almost
          does it.
          <snipped/>
          Hey Josh:

          I found out about GovTrack while looking into SemWeb. All of
          the "<snipped/>" info you provided above is very helpful to me.

          It is interesting to me that one part is not really programming at
          all. This is the "curation" piece. Coming up with all of
          the "knowledge links" between datasets should just be a semantic task
          [maybe there are no tools to help].

          But I can guess that this can be helped by code to link all of
          the "Clinton" [in GovTrack] references to "Hillary/Bill/New
          York/Arkansas" [in Wiki/DBPedia].

          With some stuff I think about, the first question is: how hard is
          this curation? Can this be done with Protege
          [http://protege.stanford.edu/%5d?

          Then if we do update GovTrack with all this, what happens when
          GovTrack is refreshed?

          Regards..RogerW
        • Josh Tauberer
          ... Yes, I think. The diagram is the generic Linking Open Data diagram that shows all of the datasets in the LOD community. Besides DBPedia, the highlighted
          Message 4 of 8 , Jul 8, 2008
          • 0 Attachment
            Roger Williams wrote:
            > --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
            >> The data is marginally interlinked. For instance, their New York
            > resource:
            >> http://dbpedia.org/resource/New_York
            >> is owl:sameAs'd to my New York resource:
            >> http://www.rdfabout.com/rdf/usgov/geo/us/ny
            >>
            >> This was done based on their previously existing owl:sameAs links
            > to the
            >> Geonames dataset, and then the links I created between Geonames and
            > my
            >> Census data set (http://www.rdfabout.com/demo/census). That's the
            > extent
            >> of the interlinking --- otherwise the datasets don't connect at
            > all. No
            >> politician interlinking, for instance. (Also see:
            >> http://wiki.dbpedia.org/Interlinking)
            >>
            > Yeah...I looked at this DBPedia page before. What I got confused
            > about is that it has a bubble for "US Census Data" and
            > then "GovTrack" is grey-ed. There is no legend for this graphic.
            >
            > Do you know what the "grey-ed" bubbles connote? Is this like more
            > than one "degree of separation"?

            Yes, I think. The diagram is the generic Linking Open Data diagram that
            shows all of the datasets in the LOD community. Besides DBPedia, the
            highlighted ones are those which DBPedia itself asserts owl:sameAs links
            to. The US Census Data refers to my Census data set which has the
            geographic identifiers, and then hanging off of that (the GovTrack
            bubble) is the political side to my data.

            > But I can guess that this can be helped by code to link all of
            > the "Clinton" [in GovTrack] references to "Hillary/Bill/New
            > York/Arkansas" [in Wiki/DBPedia].
            >
            > With some stuff I think about, the first question is: how hard is
            > this curation? Can this be done with Protege
            > [http://protege.stanford.edu/%5d?

            Identifying connections isn't hard to do by hand except that there are a
            lot of connections to identify, and it's more efficient if you can write
            a program to do it en mass. For Members of Congress, ok, there are only
            435 now, but there are some 10,000 looking back to 1786 (or whatever),
            plus there are some 75,000 geographic entities that might be related
            between data sets.

            > Then if we do update GovTrack with all this, what happens when
            > GovTrack is refreshed?

            Whatever connections are made would be stored in its own file, and since
            the identifiers for things are stable, it would continue to make sense
            as data is updated.

            --
            - Josh Tauberer
            - GovTrack.us

            http://razor.occams.info

            "Yields falsehood when preceded by its quotation! Yields
            falsehood when preceded by its quotation!" Achilles to
            Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
          Your message has been successfully submitted and would be delivered to recipients shortly.