Loading ...
Sorry, an error occurred while loading the content.

linking Govtrack data

Expand Messages
  • dbsearch04
    Hello Josh: I am a programmer who recently started [again] looking at the sematic web stuff. An iteresting site I came across is DBPedia. They have indexed the
    Message 1 of 8 , Jul 4, 2008
    • 0 Attachment
      Hello Josh:

      I am a programmer who recently started [again] looking at the sematic
      web stuff. An iteresting site I came across is DBPedia. They have
      indexed the WikiPedia content and process SPARQL queries against that
      content.

      I thought one of the big "features" of the Semantic Web was
      integration with other "rdf'ed" data stores. So I have what feels
      like a dumb question:

      Is it difficult to get the results from a SPARQL query that would
      link the Govtrack data with the DBPedia data?

      And a little less dumb:

      Is this only "technically" feasible and do I need to find/make some
      kind of "federated query processor" to realize this.

      TIA.

      Regards..RogerW
    • Harvey Frey
      Roger: The DBPedia has a lot of general info, but I don t see much relating to Law, per se, which is what we d need. It seems that both DBPedia and Yago depend
      Message 2 of 8 , Jul 4, 2008
      • 0 Attachment
        Roger:
         
            The DBPedia has a lot of general info, but I don't see much relating to Law, per se, which is what we'd need.
         
            It seems that both DBPedia and Yago depend heavily on the "ontology" previously constructed by Wikipedia, as well as the facts edited there. Where could we find such a structure for US Law?
         
            The Estrella project and Metalex at the Leibniz Center for Law, as well as the CEN Metalex are oriented toward Europe and pretty abstract. Norma in Rete is specifically Italian, and Akoma Ntosa is aimed at Africa.
         
            Is there any open-source group in the US which has done similar work we could build on? I'm sure that Lexis-Nexis and Westlaw have wonderful databases, but they're proprietary.
         
        Harvey
         
        ----- Original Message -----
        Sent: Friday, July 04, 2008 11:56 AM
        Subject: [govtrack] linking Govtrack data

        Hello Josh:

        I am a programmer who recently started [again] looking at the sematic
        web stuff. An iteresting site I came across is DBPedia. They have
        indexed the WikiPedia content and process SPARQL queries against that
        content.

        I thought one of the big "features" of the Semantic Web was
        integration with other "rdf'ed" data stores. So I have what feels
        like a dumb question:

        Is it difficult to get the results from a SPARQL query that would
        link the Govtrack data with the DBPedia data?

        And a little less dumb:

        Is this only "technically" feasible and do I need to find/make some
        kind of "federated query processor" to realize this.

        TIA.

        Regards..RogerW

      • dbsearch04
        Thanks Harvey: DBPedia has this example query [in English]: Mayors of US cities higher than 1000m stored here: http://wikipedia.aksw.org/index.php?qid=9 I
        Message 3 of 8 , Jul 5, 2008
        • 0 Attachment
          Thanks Harvey:

          DBPedia has this example query [in English]:

          "Mayors of US cities higher than 1000m"

          stored here: http://wikipedia.aksw.org/index.php?qid=9

          I don't know how to display the raw SPARQL for this. But I might want
          to "join" this query with:

          "and where earmarks were proposed in the current Congress".

          I saw that earmarks are not yet in GovTrack, but this is an example
          query I could dream up to "join" the disparate data source.

          Regards..
          --- In govtrack@yahoogroups.com, "Harvey Frey" <hsfrey@...> wrote:
          >
          > Roger:
          >
          > The DBPedia has a lot of general info, but I don't see much
          relating to Law, per se, which is what we'd need.
          >
          > It seems that both DBPedia and Yago depend heavily on
          the "ontology" previously constructed by Wikipedia, as well as the
          facts edited there. Where could we find such a structure for US Law?
          >
          > The Estrella project and Metalex at the Leibniz Center for Law,
          as well as the CEN Metalex are oriented toward Europe and pretty
          abstract. Norma in Rete is specifically Italian, and Akoma Ntosa is
          aimed at Africa.
          >
          > Is there any open-source group in the US which has done similar
          work we could build on? I'm sure that Lexis-Nexis and Westlaw have
          wonderful databases, but they're proprietary.
          >
          > Harvey
          >
          > ----- Original Message -----
          > From: dbsearch04
          > To: govtrack@yahoogroups.com
          > Sent: Friday, July 04, 2008 11:56 AM
          > Subject: [govtrack] linking Govtrack data
          >
          >
          > Hello Josh:
          >
          > I am a programmer who recently started [again] looking at the
          sematic
          > web stuff. An iteresting site I came across is DBPedia. They have
          > indexed the WikiPedia content and process SPARQL queries against
          that
          > content.
          >
          > I thought one of the big "features" of the Semantic Web was
          > integration with other "rdf'ed" data stores. So I have what feels
          > like a dumb question:
          >
          > Is it difficult to get the results from a SPARQL query that would
          > link the Govtrack data with the DBPedia data?
          >
          > And a little less dumb:
          >
          > Is this only "technically" feasible and do I need to find/make
          some
          > kind of "federated query processor" to realize this.
          >
          > TIA.
          >
          > Regards..RogerW
          >
        • Josh Tauberer
          ... Yes, it s a neat project. I ve talked to the guys behind that a little bit. ... As you raised in your second question, there are two sides to this - first
          Message 4 of 8 , Jul 7, 2008
          • 0 Attachment
            dbsearch04 wrote:
            > I am a programmer who recently started [again] looking at the sematic
            > web stuff. An iteresting site I came across is DBPedia. They have
            > indexed the WikiPedia content and process SPARQL queries against that
            > content.

            Yes, it's a neat project. I've talked to the guys behind that a little bit.

            > I thought one of the big "features" of the Semantic Web was
            > integration with other "rdf'ed" data stores. So I have what feels
            > like a dumb question:
            >
            > Is it difficult to get the results from a SPARQL query that would
            > link the Govtrack data with the DBPedia data?

            As you raised in your second question, there are two sides to this -
            first is whether the data is connected, and second is is it technically
            feasible to query across both data sets at once.

            The data is marginally interlinked. For instance, their New York resource:
            http://dbpedia.org/resource/New_York
            is owl:sameAs'd to my New York resource:
            http://www.rdfabout.com/rdf/usgov/geo/us/ny

            This was done based on their previously existing owl:sameAs links to the
            Geonames dataset, and then the links I created between Geonames and my
            Census data set (http://www.rdfabout.com/demo/census). That's the extent
            of the interlinking --- otherwise the datasets don't connect at all. No
            politician interlinking, for instance. (Also see:
            http://wiki.dbpedia.org/Interlinking)

            > Is this only "technically" feasible and do I need to find/make some
            > kind of "federated query processor" to realize this.

            If you wanted to issue a single query, then yes you would need some sort
            of federated query processor. You would also need to construct the query
            creatively to deal with the owl:sameAs links.

            My SemWeb library (http://razor.occams.info/code/semweb) almost does it.
            I had to fix a few bugs just now, and the query takes a good amount
            of time to get through. Plus it took me about 15 tries to get the query
            right, and in the process I seemed to have taken Dbpedia down twice and
            my SPARQL endpoint down once. (The query processor decides to issue some
            very unfriendly queries to the endpoints if you don't do exactly what
            you intend.)

            What I eventually came up with is this:

            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX dc: <http://purl.org/dc/elements/1.1/>
            PREFIX owl: <http://www.w3.org/2002/07/owl#>
            PREFIX census: <http://www.rdfabout.com/rdf/schema/census/>

            SELECT ?population ?person WHERE {
            ?ny1 dc:title "New York" .
            ?ny1 census:population ?population .
            ?ny2 owl:sameAs ?ny1 .
            ?person <http://dbpedia.org/property/birthPlace> ?ny2 .
            }

            In English, for the state named "New York", tell me its population (my
            SPARQL endpoint) and everyone born in it (DBPedia). It's a dumb query,
            but it's a starting point.

            I ran the query using this command with my library, which uses the
            probably-undocumented "|" feature to create a sort of distributed data
            source:

            mono rdfquery.exe -type sparql \
            "sparql-http:http://www.govtrack.us/sparql|
            sparql-http:http://DBpedia.org/sparql" \
            < govtrack-dbpedia.sparql

            and it came back with a bunch of results, including one row with
            ?population as 18976457 and ?person as
            http://dbpedia.org/resource/Kareem_Abdul-Jabbar.

            Being able to issue more interesting queries would be nice, and having
            more resources in the two datasets interlinked would be very helpful. It
            wouldn't be too difficult to interlink politicians since Wikipedians
            have done a fair job of adding template content to politician pages that
            includes the identifiers used for the politicians on GovTrack,
            VoteSmart, Bioguide, and possibly elsewhere.

            There are some projects there, then, if you're interested.

            --
            - Josh Tauberer
            - GovTrack.us

            http://razor.occams.info

            "Yields falsehood when preceded by its quotation! Yields
            falsehood when preceded by its quotation!" Achilles to
            Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
          • Josh Tauberer
            ... That s a really good example. (I hadn t read your follow-up until after sending my last email.) Doing some reverse engineering the query is: PREFIX db:
            Message 5 of 8 , Jul 7, 2008
            • 0 Attachment
              dbsearch04 wrote:
              > DBPedia has this example query [in English]:
              >
              > "Mayors of US cities higher than 1000m"
              >
              > stored here: http://wikipedia.aksw.org/index.php?qid=9
              >
              > I don't know how to display the raw SPARQL for this. But I might want
              > to "join" this query with:
              >
              > "and where earmarks were proposed in the current Congress".
              >
              > I saw that earmarks are not yet in GovTrack, but this is an example
              > query I could dream up to "join" the disparate data source.

              That's a really good example. (I hadn't read your follow-up until after
              sending my last email.)

              Doing some reverse engineering the query is:

              PREFIX db: <http://dbpedia.org/property/>
              SELECT * WHERE {
              ?city db:leaderName ?leader ;
              db:subdivisionName ?subdiv ;
              db:elevation ?elevation .
              }

              When I put <http://dbpedia.org/resource/United_States> for ?subdiv I get
              no results. It might be that the query page you linked to is working off
              of a different version of the dataset than what is served by
              http://DBpedia.org/sparql.

              To try it, you could do:

              wget -O - http://DBpedia.org/sparql?query="PREFIX db:
              <http://dbpedia.org/property/>
              SELECT * WHERE {
              ?city db:leaderName ?leader ;
              db:subdivisionName ?subdiv ;
              db:elevation ?elevation .
              } LIMIT 5"

              all on one line.

              Earmark data is very rough around the edges, but there is a lot of it
              these days thanks to the work of Taxpayers for Common Sense, and
              Sunlight, and I suspect that it would be possible to get it into a form
              where a query like the one you suggested would be possible.

              As with any of the project ideas that come up on this list, if you're
              interested in hacking on it, please let me know 1) so I can be
              encouraging, and 2) so I can let you know if external funding to work on
              the project might be available.

              --
              - Josh Tauberer
              - GovTrack.us

              http://razor.occams.info

              "Yields falsehood when preceded by its quotation! Yields
              falsehood when preceded by its quotation!" Achilles to
              Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
            • Roger Williams
              ... to the ... my ... extent ... all. No ... Yeah...I looked at this DBPedia page before. What I got confused about is that it has a bubble for US Census
              Message 6 of 8 , Jul 8, 2008
              • 0 Attachment
                --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
                > The data is marginally interlinked. For instance, their New York
                resource:
                > http://dbpedia.org/resource/New_York
                > is owl:sameAs'd to my New York resource:
                > http://www.rdfabout.com/rdf/usgov/geo/us/ny
                >
                > This was done based on their previously existing owl:sameAs links
                to the
                > Geonames dataset, and then the links I created between Geonames and
                my
                > Census data set (http://www.rdfabout.com/demo/census). That's the
                extent
                > of the interlinking --- otherwise the datasets don't connect at
                all. No
                > politician interlinking, for instance. (Also see:
                > http://wiki.dbpedia.org/Interlinking)
                >
                Yeah...I looked at this DBPedia page before. What I got confused
                about is that it has a bubble for "US Census Data" and
                then "GovTrack" is grey-ed. There is no legend for this graphic.

                Do you know what the "grey-ed" bubbles connote? Is this like more
                than one "degree of separation"?

                Regards..RogerW
              • Roger Williams
                ... ... does it. Hey Josh: I found out about GovTrack while looking into SemWeb. All of the info you provided above is very
                Message 7 of 8 , Jul 8, 2008
                • 0 Attachment
                  --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
                  >
                  <snipped/>
                  > My SemWeb library (http://razor.occams.info/code/semweb) almost
                  does it.
                  <snipped/>
                  Hey Josh:

                  I found out about GovTrack while looking into SemWeb. All of
                  the "<snipped/>" info you provided above is very helpful to me.

                  It is interesting to me that one part is not really programming at
                  all. This is the "curation" piece. Coming up with all of
                  the "knowledge links" between datasets should just be a semantic task
                  [maybe there are no tools to help].

                  But I can guess that this can be helped by code to link all of
                  the "Clinton" [in GovTrack] references to "Hillary/Bill/New
                  York/Arkansas" [in Wiki/DBPedia].

                  With some stuff I think about, the first question is: how hard is
                  this curation? Can this be done with Protege
                  [http://protege.stanford.edu/%5d?

                  Then if we do update GovTrack with all this, what happens when
                  GovTrack is refreshed?

                  Regards..RogerW
                • Josh Tauberer
                  ... Yes, I think. The diagram is the generic Linking Open Data diagram that shows all of the datasets in the LOD community. Besides DBPedia, the highlighted
                  Message 8 of 8 , Jul 8, 2008
                  • 0 Attachment
                    Roger Williams wrote:
                    > --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
                    >> The data is marginally interlinked. For instance, their New York
                    > resource:
                    >> http://dbpedia.org/resource/New_York
                    >> is owl:sameAs'd to my New York resource:
                    >> http://www.rdfabout.com/rdf/usgov/geo/us/ny
                    >>
                    >> This was done based on their previously existing owl:sameAs links
                    > to the
                    >> Geonames dataset, and then the links I created between Geonames and
                    > my
                    >> Census data set (http://www.rdfabout.com/demo/census). That's the
                    > extent
                    >> of the interlinking --- otherwise the datasets don't connect at
                    > all. No
                    >> politician interlinking, for instance. (Also see:
                    >> http://wiki.dbpedia.org/Interlinking)
                    >>
                    > Yeah...I looked at this DBPedia page before. What I got confused
                    > about is that it has a bubble for "US Census Data" and
                    > then "GovTrack" is grey-ed. There is no legend for this graphic.
                    >
                    > Do you know what the "grey-ed" bubbles connote? Is this like more
                    > than one "degree of separation"?

                    Yes, I think. The diagram is the generic Linking Open Data diagram that
                    shows all of the datasets in the LOD community. Besides DBPedia, the
                    highlighted ones are those which DBPedia itself asserts owl:sameAs links
                    to. The US Census Data refers to my Census data set which has the
                    geographic identifiers, and then hanging off of that (the GovTrack
                    bubble) is the political side to my data.

                    > But I can guess that this can be helped by code to link all of
                    > the "Clinton" [in GovTrack] references to "Hillary/Bill/New
                    > York/Arkansas" [in Wiki/DBPedia].
                    >
                    > With some stuff I think about, the first question is: how hard is
                    > this curation? Can this be done with Protege
                    > [http://protege.stanford.edu/%5d?

                    Identifying connections isn't hard to do by hand except that there are a
                    lot of connections to identify, and it's more efficient if you can write
                    a program to do it en mass. For Members of Congress, ok, there are only
                    435 now, but there are some 10,000 looking back to 1786 (or whatever),
                    plus there are some 75,000 geographic entities that might be related
                    between data sets.

                    > Then if we do update GovTrack with all this, what happens when
                    > GovTrack is refreshed?

                    Whatever connections are made would be stored in its own file, and since
                    the identifiers for things are stable, it would continue to make sense
                    as data is updated.

                    --
                    - Josh Tauberer
                    - GovTrack.us

                    http://razor.occams.info

                    "Yields falsehood when preceded by its quotation! Yields
                    falsehood when preceded by its quotation!" Achilles to
                    Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
                  Your message has been successfully submitted and would be delivered to recipients shortly.