Loading ...
Sorry, an error occurred while loading the content.

Re: [rest-discuss] Why SPARQL endpoints aren't even remotely RESTful.

Expand Messages
  • Eric J. Bowman
    ... Dereferencing that URI executes the query... ... You re confusing me. ;-) My scenario executes one stored SPARQL query (by default, unless Accept:
    Message 1 of 22 , Feb 3, 2011
    • 0 Attachment
      Nathan wrote:
      >
      > Eric J. Bowman wrote:
      > > There is one way I can think of to use SPARQL queries in a REST app,
      > > which is to POST or PUT a representation as
      > > application/sparql-query to some URI. Dereferencing that URI
      > > executes the query as a stored procedure, returning
      > > application/sparql-results+xml by default, but can also return the
      > > original query with Accept: application/sparql- query.
      >
      > Indeed :) one little question though, what happens when somebody GETs
      > the URI?
      >

      "Dereferencing that URI executes the query..."

      >
      > For example, given such a scenario I'd quite like to send people back
      > some HTML, with a form in it, that allowed them to run test SPARQL
      > queries and get back the "raw results", say by putting the query in a
      > form element and submitting the form. Sound feasible / RESTful? if
      > so, POST/PUT or GET?
      >

      You're confusing me. ;-) My scenario executes one stored SPARQL query
      (by default, unless Accept: application/sparql-query) at a fixed URI.
      POST or PUT can create that URI (depending on whether the user-agent or
      the origin server assigns the URI), by uploading the query as a file.
      PUT may be used to replace the query, i.e. edit the file. Note that
      PUT will only Allow: application/sparql-query -- you can't edit the
      result, PUT that back, and expect the server to reformulate the query.

      Standard REST design pattern, I've used it with PHP, XQuery, SSJS, JSP,
      ASP... not always self-descriptively, as PHP etc. lack media types, and
      always access-restricted for methods other than GET/HEAD/OPTIONS.

      You're saying you want GET to return HTML results with a form? Fine,
      add text/html to the conneg mix, return the results with a form in
      that representation, pre-fill the textarea with the current raw SPARQL
      query, and instruct the user-agent to PUT application/sparql-query
      to the URI upon submission.

      >
      > ps: a little confused after reading the above "one way I can think og
      > to use SPARQL queries in a REST app, which is to POST or.." and the
      > mail you sent immediately before it saying "Please don't confuse a
      > post about how POST isn't unRESTful, as saying that it's ever even
      > remotely OK to use POST as a retrieval method." - I'm probably
      > missing something obvious here, or perhaps a subtlety in
      > interpetation.
      >

      Yes, there's a nuance here that will lead some folks to believe that my
      example is no different than Mike's and resembles the SPARQL endpoint
      I'm griping about, and conclude that I've contradicted myself when I
      haven't. I'm not using POST to execute queries, SPARQL syntax hasn't
      leaked out into my URIs, and my API is *somewhat* self-documenting, in
      that the query isn't entirely opaque when presented with the results it
      generates.

      -Eric
    • Mark Baker
      Hi Alistair, ... That would certainly work in the sense of bringing the cost down, but it would be a shadow of a SPARQL endpoint from the point of view of
      Message 2 of 22 , Feb 3, 2011
      • 0 Attachment
        Hi Alistair,

        On Thu, Feb 3, 2011 at 3:52 AM, Alistair Miles <alimanfoo@...> wrote:
        > What we really wanted to be able to do was be able to place a hard limit
        > on the amount of resources any one query could consume. A simple way to do
        > this might be to kill queries that took longer than X seconds, a la simpledb
        > [3]. My colleage Graham Klyne and I had a chat with Andy Seaborne about doing
        > this with Jena TDB, which wasn't possible at the time, and Andy seemed to
        > think it was doable (?) but there were details around how TDB executed queries
        > and also how TDB's various optimisers worked that I didn't fully understand,
        > and we never had the time to follow this through. I haven't done any recent
        > work on SPARQL, so it may be that there are query engines out there that
        > support this kind of thing out of the box.

        That would certainly work in the sense of bringing the cost down, but
        it would be a shadow of a SPARQL endpoint from the point of view of
        client expectations, no? Its proper functioning would be dependent on
        far too many variables that the client has no control over.

        That could be remedied by the publisher documenting a set of queries
        which it can guarantee will complete in a reasonable time, because it
        has optimized specifically for them (indexes, caching, etc..) ... but
        then that's exactly what they'd be doing if they put an HTTP interface
        in front of that data.

        > So I guess I'm saying, you may be right, but I wouldn't discount the viability
        > of open sparql endpoints just yet, I think the jury's still out.

        I'd be happy to be proven wrong because it would clearly be awesome to
        be able to use SPARQL over the 'net. Alas, everything I know about
        the Web tells me I'm not.

        Mark.
      • Alistair Miles
        Hi Mark, ... That s a good point. But I do wonder if there is still a middle-ground worth a bit of exploration. By that I mean, for any given endpoint, based
        Message 3 of 22 , Feb 4, 2011
        • 0 Attachment
          Hi Mark,

          On Thu, Feb 03, 2011 at 03:25:42PM -0500, Mark Baker wrote:
          > Hi Alistair,
          >
          > On Thu, Feb 3, 2011 at 3:52 AM, Alistair Miles <alimanfoo@...> wrote:
          > > What we really wanted to be able to do was be able to place a hard limit
          > > on the amount of resources any one query could consume. A simple way to do
          > > this might be to kill queries that took longer than X seconds, a la simpledb
          > > [3]. My colleage Graham Klyne and I had a chat with Andy Seaborne about doing
          > > this with Jena TDB, which wasn't possible at the time, and Andy seemed to
          > > think it was doable (?) but there were details around how TDB executed queries
          > > and also how TDB's various optimisers worked that I didn't fully understand,
          > > and we never had the time to follow this through. I haven't done any recent
          > > work on SPARQL, so it may be that there are query engines out there that
          > > support this kind of thing out of the box.
          >
          > That would certainly work in the sense of bringing the cost down, but
          > it would be a shadow of a SPARQL endpoint from the point of view of
          > client expectations, no? Its proper functioning would be dependent on
          > far too many variables that the client has no control over.
          >
          > That could be remedied by the publisher documenting a set of queries
          > which it can guarantee will complete in a reasonable time, because it
          > has optimized specifically for them (indexes, caching, etc..) ... but
          > then that's exactly what they'd be doing if they put an HTTP interface
          > in front of that data.

          That's a good point. But I do wonder if there is still a middle-ground worth
          a bit of exploration. By that I mean, for any given endpoint, based on my
          experience with Jena TDB and the FlyBase dataset (~180m triples) [1] there
          will probably still be quite a large space of possible queries that will
          execute with low cost even without the service making indexing or caching
          optimisations specific to the data.

          E.g., try doing the example search at
          http://openflydata.org/flyui/build/apps/expressionbygenebatch/ with firebug
          open to see the underlying SPARQL queries. These queries are not trivial
          but usually complete in less than a few seconds (some are sub-second), and
          the endpoints are all hosted on a modest EC2 m1.small instance. No specific
          optimisations were made for any of these endpoints, beyond the use of the
          generic TDB statistics-based optimiser.

          (In case you were wondering, those are fruit fly embryos and testes you're
          looking at :)

          Don't get me wrong, I'm not trying to claim SPARQL will revolutionise the
          Web, and I haven't done any work with SPARQL since 2009, so I have nothing
          invested in it.

          But I do wonder if, for people who have an interesting dataset that they'd
          like to share with others, exposing their dataset via a SPARQL endpoint
          would be worthwhile, even if they limited resource usage. I.e., the data
          publisher would say, "here's my data, you can execute x queries per second,
          any queries longer than y seconds will get killed, here's some statistics
          about the data to help you figure out what's there, go explore". Any third
          parties then interested in re-using the data could try a few sparql queries to
          see if they were efficient, and if so, query the SPARQL endpoint directly in
          their mashup/... application. If the queries they needed turned out not to be
          so efficient, then they could begin a dialogue with the data provider about
          an HTTP interface that is optimised for a particular set of requirements,
          or they could harvest (via the SPARQL endpoint?), cache and index the data
          they need and do their own optimisation.

          I think this is especially interesting where a dataset is widely
          applicable. E.g., FlyBase stores reference data on the fruit fly genome,
          which is central to genomic research in Drosophila and is re-used in an
          extremely diverse range of applications. For FlyBase, they serve their
          community better by providing more flexible interfaces to their data, because
          they cannot possibly predict all requirements. (In fact, FlyBase currently
          provide an SQL endpoint on a best-effort basis, which anyone can use if you
          know where to find it.)

          And providing a query endpoint is a nice way of lowering the costs for third
          parties re-using your data. E.g., if it's a SPARQL endpoint, then re-using
          the data is as simple as writing a bit of Python/Perl/..., or putting
          some HTML and javascript on a web server, very low infrastructure costs
          or complexity. This is more relevant where there are lots of small-scale
          data re-users that don't have access to hosting infrastructure, which is
          particularly the case in biological research.

          Having said all that, I've found a fair few queries that SPARQL is just crap
          at, and will never work without data-specific indexing and caching. Graham
          Klyne got some funding recently work with the Jena guys to look at extending
          SPARQL endpoints with multiple data-specific indexes [2], that could be
          interesting, and I'm sure others are working in the same space.

          A few shades of grey worth talking about here?

          Cheers,

          Alistair

          [1] http://code.google.com/p/openflydata/wiki/FlyBaseMilestone3
          [2] http://code.google.com/p/milarq/

          --
          Alistair Miles
          Head of Epidemiological Informatics
          Centre for Genomics and Global Health <http://cggh.org>
          The Wellcome Trust Centre for Human Genetics
          Roosevelt Drive
          Oxford
          OX3 7BN
          United Kingdom
          Web: http://purl.org/net/aliman
          Email: alimanfoo@...
          Tel: +44 (0)1865 287669
        • Danny Ayers
          The Talis Platform (http://talis.com/platform) is a Software as a Service system providing SPARQL-capable RDF stores alongside stores for arbitrary content
          Message 4 of 22 , Feb 4, 2011
          • 0 Attachment
            The Talis Platform (http://talis.com/platform) is a Software as a Service system providing SPARQL-capable RDF stores alongside stores for arbitrary content (HTML, blobs, whatever). It's particularly relevant here because access to the Platform is solely through HTTP (and I believe generally RESTful). I'm not up-to-date on developments so I sent a ping on Twitter, Sam (cc'd) responded:

            [[
            At the moment, the query throttling in the Platform works rather naively. Just like Alistair with openflydata, we've restricted some language features. Some of this is explicit, some SPARQL1.1 stuff we have enabled (like aggregates etc), others disabled (property paths). Also, we don't support any extension or property functions at the moment, so I guess that restricts quite a lot of *potential* functionality. We also do the other thing that Alistair talks about in the mail thread, we time how long each query is taking to execute and terminate those that run over a certain threshold currently 30 seconds. Obviously, this is at best crude and is something we'd like to improve. One major shortcoming currently is that a terminated query returns just an error response, no results. A patch has recently been submitted to ARQ to allow terminated queries to at least return the results they've already got, Paolo is working on getting that accepted now.

            I know its a bit apples & oranges, but do you think the SPARQL Uniform HTTP Protocol[3] is relevant to the thread on rest-discuss?
            ...
            [1] http://twitter.com/#!/danja/statuses/33094970680283136
            [2] https://issues.apache.org/jira/browse/JENA-29
            [3] http://www.w3.org/TR/sparql11-http-rdf-update/
            ]]

            --
            http://danny.ayers.name

          • Bob Ferris
            Hi, I recently got aware of Fuseki[1], a SPARQL server. The claim on the wiki is: It provides the REST-style SPARQL HTTP Update, and SPARQL Query and SPARQL
            Message 5 of 22 , Feb 4, 2011
            • 0 Attachment
              Hi,

              I recently got aware of Fuseki[1], a SPARQL server. The claim on the
              wiki is:

              "It provides the REST-style SPARQL HTTP Update, and SPARQL Query and
              SPARQL Update using the SPARQL protocol over HTTP."

              I think the "hypermedia as the engine of application state" constraint
              is not fulfilled, or?

              Cheers,


              Bob

              [1] http://openjena.org/wiki/Fuseki
            • Eric J. Bowman
              ... Well, that s one of em. In REST circles, you ll encounter the phrase HTTP != REST which means that REST is a subset of the things you can do with HTTP,
              Message 6 of 22 , Feb 4, 2011
              • 0 Attachment
                Bob Ferris wrote:
                >
                > I think the "hypermedia as the engine of application state"
                > constraint is not fulfilled, or?
                >

                Well, that's one of 'em. In REST circles, you'll encounter the phrase
                "HTTP != REST" which means that REST is a subset of the things you can
                do with HTTP, not the entire set of things you can do with HTTP. 99%
                of "REST APIs" out there are really HTTP APIs, all hail the power of
                the buzzword...

                -Eric
              • Nathan
                ... and weirdly HTTP is not a superset of REST, as in you can t do everything with HTTP that REST indicates you could/should (the mismatches) fair comment?
                Message 7 of 22 , Feb 4, 2011
                • 0 Attachment
                  Eric J. Bowman wrote:
                  > Bob Ferris wrote:
                  >> I think the "hypermedia as the engine of application state"
                  >> constraint is not fulfilled, or?
                  >>
                  >
                  > Well, that's one of 'em. In REST circles, you'll encounter the phrase
                  > "HTTP != REST" which means that REST is a subset of the things you can
                  > do with HTTP, not the entire set of things you can do with HTTP. 99%
                  > of "REST APIs" out there are really HTTP APIs, all hail the power of
                  > the buzzword...

                  and weirdly HTTP is not a superset of REST, as in you can't do
                  everything with HTTP that REST indicates you could/should (the mismatches)

                  fair comment?
                • Eric J. Bowman
                  ... Absolutely. But, assuming a RESTful design, HTTP could be replaced in the future with Waka, HTTP 2, or whatever -- clearing up those mismatches inherent
                  Message 8 of 22 , Feb 4, 2011
                  • 0 Attachment
                    Nathan wrote:
                    >
                    > and weirdly HTTP is not a superset of REST, as in you can't do
                    > everything with HTTP that REST indicates you could/should (the
                    > mismatches)
                    >
                    > fair comment?
                    >

                    Absolutely. But, assuming a RESTful design, HTTP could be replaced in
                    the future with Waka, HTTP 2, or whatever -- clearing up those
                    mismatches inherent in HTTP 1.1 without actually changing the API.

                    -Eric
                  Your message has been successfully submitted and would be delivered to recipients shortly.