Loading ...
Sorry, an error occurred while loading the content.
 

gonna need some SPARQL [help]

Expand Messages
  • Roger Williams
    Hello Josh: Wow, there s a lot of stuff out there . Anyway, I am trying to get my head around some SPARQL. Debugging some large SQL queries can take 2 beers!
    Message 1 of 7 , Jul 8, 2008
      Hello Josh:

      Wow, there's a lot of stuff out "there". Anyway, I am trying to get my
      head around some SPARQL. Debugging some large SQL queries can take 2
      beers!

      First, I want to get some data just from GovTrack. For a DB, I would
      ask for the "schema", but it seems like that doesn't make sense in this
      RDF stuff. So I just want to start with a couple of examples:

      1. list of states who have representatives who gave a speech on any
      bill during Feb 2008?, or
      2. show the dates of the bills introduced by reps from Missouri and
      voted "Yea!" on by any reps from Kentucky?

      Is there complicated SPARQL to extract "simple" reports like this?

      Also, I asume there is no string match in SPARQL (i.e. '%' in Oracle),
      right?

      TIA.

      Regards..
    • Josh Tauberer
      ... There are schemas, it just doesn t really help you create the database: http://www.govtrack.us/share/vocabs.xpd ... Okay, wish me luck. This isn t easy....
      Message 2 of 7 , Jul 9, 2008
        Roger Williams wrote:
        > Hello Josh:
        >
        > Wow, there's a lot of stuff out "there". Anyway, I am trying to get
        > my head around some SPARQL. Debugging some large SQL queries can take
        > 2 beers!
        >
        > First, I want to get some data just from GovTrack. For a DB, I would
        > ask for the "schema", but it seems like that doesn't make sense in
        > this RDF stuff.

        There are schemas, it just doesn't really help you create the database:
        http://www.govtrack.us/share/vocabs.xpd

        > So I just want to start with a couple of examples:

        Okay, wish me luck. This isn't easy....

        > 1. list of states who have representatives who gave a speech on any
        > bill during Feb 2008?,

        I don't have speech data in RDF. I'll do instead "who introduced a bill
        during Feb 2008". Basically it took me all morning to get this to work
        in principle, and it's still not working. Because of an inconsistency in
        the data model for representatives at-large versus for a district, a
        UNION is necessary to relate the reps to the name of the state they are
        in, but it doesn't seem to be working, so I've commented out half of the
        UNION. But even with parts commented out, the query is taking too long
        to process and times out. It would be faster if my endpoint wasn't
        composed of a handful of separate databases that are being federated.

        The query is:

        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX dcterms: <http://purl.org/dc/terms/>
        PREFIX usgov: <http://www.rdfabout.com/rdf/schema/usgovt/>
        PREFIX bill: <http://www.rdfabout.com/rdf/schema/usbill/>
        PREFIX pol: <http://www.rdfabout.com/rdf/schema/politico/>
        PREFIX time: <http://pervasive.semanticweb.org/ont/2004/06/time#>
        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

        SELECT ?name ?statename ?billtitle ?introduceddate WHERE {
        # Get current representatives and district they represent.
        ?person usgov:name ?name ;
        pol:hasRole [
        time:to [ time:at "2008-12-31"^^xsd:date ] ;
        pol:forOffice [ pol:represents ?district ]
        ] .

        # Match the people to the bills they sponsor and get the
        # introduced date of the bill.
        ?bill bill:title ?billtitle ;
        bill:sponsor ?person ;
        bill:introduced ?introduceddate.

        # Limit introduced date to a certain time. This is a hackish
        # way to do it, but probably the fastest to execute for my
        # endpoint.
        FILTER(regex(?introduceddate, "^2008-02")) .

        # A representative can represent either a state or a district,
        # but we want to list state names, so we have to test for
        # either possibility.
        {
        # If it's within a state, use the name of the state.
        ?district dcterms:isPartOf [
        rdf:type usgov:State ;
        dc:title ?statename ] .
        } UNION {
        # If district itself is a state, use its name.
        ?district rdf:type usgov:State ;
        dc:title ?statename .
        }
        }
        LIMIT 30

        > 2. show the dates of the bills introduced
        > by reps from Missouri and voted "Yea!" on by any reps from Kentucky?

        This time I'll limit it to senators and representatives at-large to
        avoid the UNION. Also there didn't seem to be any bills from Missouri
        actually voted on (can it be? maybe data is wrong) so I reversed the states.

        PREFIX usgov: <http://www.rdfabout.com/rdf/schema/usgovt/>
        PREFIX bill: <http://www.rdfabout.com/rdf/schema/usbill/>
        PREFIX pol: <http://www.rdfabout.com/rdf/schema/politico/>
        PREFIX vote: <http://www.rdfabout.com/rdf/schema/vote/>

        SELECT ?title ?date WHERE {
        ?person1
        #usgov:name ?name1 ;
        pol:hasRole [
        pol:forOffice [ pol:represents
        <http://www.rdfabout.com/rdf/usgov/geo/us/ky> ]
        ] .

        ?person2
        #usgov:name ?name2 ;
        pol:hasRole [
        pol:forOffice [ pol:represents
        <http://www.rdfabout.com/rdf/usgov/geo/us/mo> ]
        ] .

        ?bill bill:title ?title ;
        bill:sponsor ?person1 ;
        bill:introduced ?date.

        ?bill bill:hadAction [ bill:vote [ vote:hasBallot [ vote:voter ?person2
        ; vote:option "Aye" ] ] ] .

        }
        LIMIT 30

        > Also, I asume there is no string match in SPARQL (i.e. '%' in
        > Oracle), right?

        See the FILTER with the regex() call in the first example.

        --
        - Josh Tauberer
        - GovTrack.us

        http://razor.occams.info

        "Yields falsehood when preceded by its quotation! Yields
        falsehood when preceded by its quotation!" Achilles to
        Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
      • Roger Williams
        OK..sir..this is a gigantic help.. I gotta learn this stuff before I can even pronounce a project I want to work on. I downloaded semweb, built it and start
        Message 3 of 7 , Jul 10, 2008
          OK..sir..this is a gigantic help..

          I gotta learn this stuff before I can even "pronounce" a project I
          want to work on. I downloaded semweb, built it and start looking at
          the code.

          I will study the queries you have supplied so far and see if I can
          understand and modify them. I was hoping the schemas could help me
          write queries. No matter.

          The current path I am sniffing right now is that queries like these
          [or the questions that are equivalent to these queries] and the
          results are one component of "knowledge".

          I can guess that we are a ways from an Ask Jeeves for SPARQL, but
          even those simple DBPedia queries seem a more "intelligent" way to
          learn from Wikipedia than keyword search [by itself].

          If you know any online cheat sheet for SPARQL, I would appreciate it
          [I ordered a book or three on some of this semantic stuff].

          I will start trying to work with the GovTrack data [thru SPARQL].
          That will help me "gel" my thoughts about possible ideas.

          Thanks again...more later..

          Regards..RogerW
          --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
          >
          > <snipped/>
          >
          > There are schemas, it just doesn't really help you create the
          database:
          > http://www.govtrack.us/share/vocabs.xpd
          >
          > <snipped/>
          >
        • Roger Williams
          ... bill ... The reason I used speeches is because I saw this out on GovTrack on a page like this (http://www.govtrack.us/congress/person.xpd?
          Message 4 of 7 , Jul 15, 2008
            --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
            > <Snipped/>
            > I don't have speech data in RDF. I'll do instead "who introduced a
            bill
            > <snipped/>
            >

            The reason I used speeches is because I saw this out on GovTrack on a
            page like this (http://www.govtrack.us/congress/person.xpd?
            tab=speeches&id=400427).

            Does this mean it is scaped but not "tripled"?

            I also though that this is guaranteed to be "on the floor". Maybe you
            already have attendance figures.

            TIA.

            Regards..RogerW
          • Josh Tauberer
            ... Yes. The conversion to RDF is a side process basically unrelated to the rest of the website. ... Speeches here means statements which went into the
            Message 5 of 7 , Jul 16, 2008
              Roger Williams wrote:
              > --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
              >> <Snipped/>
              >> I don't have speech data in RDF. I'll do instead "who introduced a
              > bill
              >> <snipped/>
              >>
              >
              > The reason I used speeches is because I saw this out on GovTrack on a
              > page like this (http://www.govtrack.us/congress/person.xpd?
              > tab=speeches&id=400427).
              >
              > Does this mean it is scaped but not "tripled"?

              Yes. The conversion to RDF is a side process basically unrelated to the
              rest of the website.

              > I also though that this is guaranteed to be "on the floor". Maybe you
              > already have attendance figures.

              "Speeches" here means statements which went into the Congressional
              Record, which is somewhere between a transcript of floor activity and a
              collection of press releases (but closer to the former).

              --
              - Josh Tauberer
              - GovTrack.us

              http://razor.occams.info

              "Yields falsehood when preceded by its quotation! Yields
              falsehood when preceded by its quotation!" Achilles to
              Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
            • Roger Williams
              ... Kentucky? ... vote:voter ?person2 ... Hello Josh: I got the small DBPedia query you gave me to work with wget [using cygwin]. Thanks for this baby step .
              Message 6 of 7 , Jul 20, 2008
                --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
                > <snipped/>
                >
                > The query is:
                >
                > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                > PREFIX dcterms: <http://purl.org/dc/terms/>
                > PREFIX usgov: <http://www.rdfabout.com/rdf/schema/usgovt/>
                > PREFIX bill: <http://www.rdfabout.com/rdf/schema/usbill/>
                > PREFIX pol: <http://www.rdfabout.com/rdf/schema/politico/>
                > PREFIX time: <http://pervasive.semanticweb.org/ont/2004/06/time#>
                > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
                >
                > SELECT ?name ?statename ?billtitle ?introduceddate WHERE {
                > # Get current representatives and district they represent.
                > ?person usgov:name ?name ;
                > pol:hasRole [
                > time:to [ time:at "2008-12-31"^^xsd:date ] ;
                > pol:forOffice [ pol:represents ?district ]
                > ] .
                >
                > # Match the people to the bills they sponsor and get the
                > # introduced date of the bill.
                > ?bill bill:title ?billtitle ;
                > bill:sponsor ?person ;
                > bill:introduced ?introduceddate.
                >
                > # Limit introduced date to a certain time. This is a hackish
                > # way to do it, but probably the fastest to execute for my
                > # endpoint.
                > FILTER(regex(?introduceddate, "^2008-02")) .
                >
                > # A representative can represent either a state or a district,
                > # but we want to list state names, so we have to test for
                > # either possibility.
                > {
                > # If it's within a state, use the name of the state.
                > ?district dcterms:isPartOf [
                > rdf:type usgov:State ;
                > dc:title ?statename ] .
                > } UNION {
                > # If district itself is a state, use its name.
                > ?district rdf:type usgov:State ;
                > dc:title ?statename .
                > }
                > }
                > LIMIT 30
                >
                > > 2. show the dates of the bills introduced
                > > by reps from Missouri and voted "Yea!" on by any reps from
                Kentucky?
                >
                >
                > PREFIX usgov: <http://www.rdfabout.com/rdf/schema/usgovt/>
                > PREFIX bill: <http://www.rdfabout.com/rdf/schema/usbill/>
                > PREFIX pol: <http://www.rdfabout.com/rdf/schema/politico/>
                > PREFIX vote: <http://www.rdfabout.com/rdf/schema/vote/>
                >
                > SELECT ?title ?date WHERE {
                > ?person1
                > #usgov:name ?name1 ;
                > pol:hasRole [
                > pol:forOffice [ pol:represents
                > <http://www.rdfabout.com/rdf/usgov/geo/us/ky> ]
                > ] .
                >
                > ?person2
                > #usgov:name ?name2 ;
                > pol:hasRole [
                > pol:forOffice [ pol:represents
                > <http://www.rdfabout.com/rdf/usgov/geo/us/mo> ]
                > ] .
                >
                > ?bill bill:title ?title ;
                > bill:sponsor ?person1 ;
                > bill:introduced ?date.
                >
                > ?bill bill:hadAction [ bill:vote [ vote:hasBallot [
                vote:voter ?person2
                > ; vote:option "Aye" ] ] ] .
                >
                > }
                > LIMIT 30
                >
                Hello Josh:

                I got the small DBPedia query you gave me to work with wget [using
                cygwin]. Thanks for this 'baby step'.

                But for some reason I cannot get this same method to work with
                rdfabout.com. I tried this [of course, all on one line]:

                wget -O - http://www.rdfabout.com/sparql?query="PREFIX rdf:
                <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dcterms:
                <http://purl.org/dc/terms/> PREFIX usgov:
                <http://www.rdfabout.com/rdf/schema/usgovt/> PREFIX bill:
                <http://www.rdfabout.com/rdf/schema/usbill/> SELECT ?name ?person
                WHERE { ?person usgov:name ?name . } LIMIT 15"

                The error text is shown below. It looks like something "urlencoded"
                the string and it is choking on '%20'. It appears to do the same
                encoding thing on the DBPedia parameter. Is this a restriction on
                rdfabout.com?

                When I use the SPARQL page (http://www.govtrack.us/sparql.xpd) on the
                GovTrack site, this query works fine:

                ----------------------------------------------------------
                PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                PREFIX dcterms: <http://purl.org/dc/terms/>
                PREFIX usgov: <http://www.rdfabout.com/rdf/schema/usgovt/>
                PREFIX bill: <http://www.rdfabout.com/rdf/schema/usbill/>

                SELECT ?name ?person WHERE {
                # Get current representatives and district they represent.
                ?person usgov:name ?name .
                }
                LIMIT 15
                ----------------------------------------------------------

                Should I able to run this with wget?

                FYI: This query is a butchering of one of the complicated queries you
                supplied to me earlier. I was able to get the second one to work, but
                the answer was only 1 bill. I am not sure if this is correct.

                TIA.

                Regards..RogerW
                --------------------------------------
                Error from wget:

                --2008-07-20 17:36:56-- http://www.rdfabout.com/sparql?query=PREFIX%
                20rdf:%20%3
                Chttp://www.w3.org/1999/02/22-rdf-syntax-ns
                Resolving www.rdfabout.com... 72.249.66.164
                Connecting to www.rdfabout.com|72.249.66.164|:80... connected.
                HTTP request sent, awaiting response... 400 SPARQL syntax error:
                Encountered "<" at line 1, column 13.
                2008-07-20 17:36:56 ERROR 400: SPARQL syntax error: Encountered "<"
                at line 1, column 13..
              • Josh Tauberer
                ... Shell escaping and multi-line things and url encoding and whatever is just... well, I m going to sit this one out. I have no idea. :) -- - Josh Tauberer -
                Message 7 of 7 , Jul 21, 2008
                  Roger Williams wrote:
                  > But for some reason I cannot get this same method to work with
                  > rdfabout.com. I tried this [of course, all on one line]:
                  ...
                  > The error text is shown below. It looks like something "urlencoded"
                  > the string and it is choking on '%20'. It appears to do the same
                  > encoding thing on the DBPedia parameter. Is this a restriction on
                  > rdfabout.com?
                  >
                  > When I use the SPARQL page (http://www.govtrack.us/sparql.xpd) on the
                  > GovTrack site, this query works fine:

                  Shell escaping and multi-line things and url encoding and whatever is
                  just... well, I'm going to sit this one out. I have no idea. :)

                  --
                  - Josh Tauberer
                  - GovTrack.us

                  http://razor.occams.info

                  "Yields falsehood when preceded by its quotation! Yields
                  falsehood when preceded by its quotation!" Achilles to
                  Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
                Your message has been successfully submitted and would be delivered to recipients shortly.