Re: [rest-discuss] Why SPARQL endpoints aren't even remotely RESTful.
- Nathan wrote:
>"Dereferencing that URI executes the query..."
> Eric J. Bowman wrote:
> > There is one way I can think of to use SPARQL queries in a REST app,
> > which is to POST or PUT a representation as
> > application/sparql-query to some URI. Dereferencing that URI
> > executes the query as a stored procedure, returning
> > application/sparql-results+xml by default, but can also return the
> > original query with Accept: application/sparql- query.
> Indeed :) one little question though, what happens when somebody GETs
> the URI?
>You're confusing me. ;-) My scenario executes one stored SPARQL query
> For example, given such a scenario I'd quite like to send people back
> some HTML, with a form in it, that allowed them to run test SPARQL
> queries and get back the "raw results", say by putting the query in a
> form element and submitting the form. Sound feasible / RESTful? if
> so, POST/PUT or GET?
(by default, unless Accept: application/sparql-query) at a fixed URI.
POST or PUT can create that URI (depending on whether the user-agent or
the origin server assigns the URI), by uploading the query as a file.
PUT may be used to replace the query, i.e. edit the file. Note that
PUT will only Allow: application/sparql-query -- you can't edit the
result, PUT that back, and expect the server to reformulate the query.
Standard REST design pattern, I've used it with PHP, XQuery, SSJS, JSP,
ASP... not always self-descriptively, as PHP etc. lack media types, and
always access-restricted for methods other than GET/HEAD/OPTIONS.
You're saying you want GET to return HTML results with a form? Fine,
add text/html to the conneg mix, return the results with a form in
that representation, pre-fill the textarea with the current raw SPARQL
query, and instruct the user-agent to PUT application/sparql-query
to the URI upon submission.
>Yes, there's a nuance here that will lead some folks to believe that my
> ps: a little confused after reading the above "one way I can think og
> to use SPARQL queries in a REST app, which is to POST or.." and the
> mail you sent immediately before it saying "Please don't confuse a
> post about how POST isn't unRESTful, as saying that it's ever even
> remotely OK to use POST as a retrieval method." - I'm probably
> missing something obvious here, or perhaps a subtlety in
example is no different than Mike's and resembles the SPARQL endpoint
I'm griping about, and conclude that I've contradicted myself when I
haven't. I'm not using POST to execute queries, SPARQL syntax hasn't
leaked out into my URIs, and my API is *somewhat* self-documenting, in
that the query isn't entirely opaque when presented with the results it
- Hi Alistair,
On Thu, Feb 3, 2011 at 3:52 AM, Alistair Miles <alimanfoo@...> wrote:
> What we really wanted to be able to do was be able to place a hard limit
> on the amount of resources any one query could consume. A simple way to do
> this might be to kill queries that took longer than X seconds, a la simpledb
> . My colleage Graham Klyne and I had a chat with Andy Seaborne about doing
> this with Jena TDB, which wasn't possible at the time, and Andy seemed to
> think it was doable (?) but there were details around how TDB executed queries
> and also how TDB's various optimisers worked that I didn't fully understand,
> and we never had the time to follow this through. I haven't done any recent
> work on SPARQL, so it may be that there are query engines out there that
> support this kind of thing out of the box.
That would certainly work in the sense of bringing the cost down, but
it would be a shadow of a SPARQL endpoint from the point of view of
client expectations, no? Its proper functioning would be dependent on
far too many variables that the client has no control over.
That could be remedied by the publisher documenting a set of queries
which it can guarantee will complete in a reasonable time, because it
has optimized specifically for them (indexes, caching, etc..) ... but
then that's exactly what they'd be doing if they put an HTTP interface
in front of that data.
> So I guess I'm saying, you may be right, but I wouldn't discount the viability
> of open sparql endpoints just yet, I think the jury's still out.
I'd be happy to be proven wrong because it would clearly be awesome to
be able to use SPARQL over the 'net. Alas, everything I know about
the Web tells me I'm not.
- Hi Mark,
On Thu, Feb 03, 2011 at 03:25:42PM -0500, Mark Baker wrote:
> Hi Alistair,
> On Thu, Feb 3, 2011 at 3:52 AM, Alistair Miles <alimanfoo@...> wrote:
> > What we really wanted to be able to do was be able to place a hard limit
> > on the amount of resources any one query could consume. A simple way to do
> > this might be to kill queries that took longer than X seconds, a la simpledb
> > . My colleage Graham Klyne and I had a chat with Andy Seaborne about doing
> > this with Jena TDB, which wasn't possible at the time, and Andy seemed to
> > think it was doable (?) but there were details around how TDB executed queries
> > and also how TDB's various optimisers worked that I didn't fully understand,
> > and we never had the time to follow this through. I haven't done any recent
> > work on SPARQL, so it may be that there are query engines out there that
> > support this kind of thing out of the box.
> That would certainly work in the sense of bringing the cost down, but
> it would be a shadow of a SPARQL endpoint from the point of view of
> client expectations, no? Its proper functioning would be dependent on
> far too many variables that the client has no control over.
> That could be remedied by the publisher documenting a set of queries
> which it can guarantee will complete in a reasonable time, because it
> has optimized specifically for them (indexes, caching, etc..) ... but
> then that's exactly what they'd be doing if they put an HTTP interface
> in front of that data.
That's a good point. But I do wonder if there is still a middle-ground worth
a bit of exploration. By that I mean, for any given endpoint, based on my
experience with Jena TDB and the FlyBase dataset (~180m triples)  there
will probably still be quite a large space of possible queries that will
execute with low cost even without the service making indexing or caching
optimisations specific to the data.
E.g., try doing the example search at
http://openflydata.org/flyui/build/apps/expressionbygenebatch/ with firebug
open to see the underlying SPARQL queries. These queries are not trivial
but usually complete in less than a few seconds (some are sub-second), and
the endpoints are all hosted on a modest EC2 m1.small instance. No specific
optimisations were made for any of these endpoints, beyond the use of the
generic TDB statistics-based optimiser.
(In case you were wondering, those are fruit fly embryos and testes you're
looking at :)
Don't get me wrong, I'm not trying to claim SPARQL will revolutionise the
Web, and I haven't done any work with SPARQL since 2009, so I have nothing
invested in it.
But I do wonder if, for people who have an interesting dataset that they'd
like to share with others, exposing their dataset via a SPARQL endpoint
would be worthwhile, even if they limited resource usage. I.e., the data
publisher would say, "here's my data, you can execute x queries per second,
any queries longer than y seconds will get killed, here's some statistics
about the data to help you figure out what's there, go explore". Any third
parties then interested in re-using the data could try a few sparql queries to
see if they were efficient, and if so, query the SPARQL endpoint directly in
their mashup/... application. If the queries they needed turned out not to be
so efficient, then they could begin a dialogue with the data provider about
an HTTP interface that is optimised for a particular set of requirements,
or they could harvest (via the SPARQL endpoint?), cache and index the data
they need and do their own optimisation.
I think this is especially interesting where a dataset is widely
applicable. E.g., FlyBase stores reference data on the fruit fly genome,
which is central to genomic research in Drosophila and is re-used in an
extremely diverse range of applications. For FlyBase, they serve their
community better by providing more flexible interfaces to their data, because
they cannot possibly predict all requirements. (In fact, FlyBase currently
provide an SQL endpoint on a best-effort basis, which anyone can use if you
know where to find it.)
And providing a query endpoint is a nice way of lowering the costs for third
parties re-using your data. E.g., if it's a SPARQL endpoint, then re-using
the data is as simple as writing a bit of Python/Perl/..., or putting
or complexity. This is more relevant where there are lots of small-scale
data re-users that don't have access to hosting infrastructure, which is
particularly the case in biological research.
Having said all that, I've found a fair few queries that SPARQL is just crap
at, and will never work without data-specific indexing and caching. Graham
Klyne got some funding recently work with the Jena guys to look at extending
SPARQL endpoints with multiple data-specific indexes , that could be
interesting, and I'm sure others are working in the same space.
A few shades of grey worth talking about here?
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Tel: +44 (0)1865 287669
- The Talis Platform (http://talis.com/platform) is a Software as a Service system providing SPARQL-capable RDF stores alongside stores for arbitrary content (HTML, blobs, whatever). It's particularly relevant here because access to the Platform is solely through HTTP (and I believe generally RESTful). I'm not up-to-date on developments so I sent a ping on Twitter, Sam (cc'd) responded:
At the moment, the query throttling in the Platform works rather naively. Just like Alistair with openflydata, we've restricted some language features. Some of this is explicit, some SPARQL1.1 stuff we have enabled (like aggregates etc), others disabled (property paths). Also, we don't support any extension or property functions at the moment, so I guess that restricts quite a lot of *potential* functionality. We also do the other thing that Alistair talks about in the mail thread, we time how long each query is taking to execute and terminate those that run over a certain threshold currently 30 seconds. Obviously, this is at best crude and is something we'd like to improve. One major shortcoming currently is that a terminated query returns just an error response, no results. A patch has recently been submitted to ARQ to allow terminated queries to at least return the results they've already got, Paolo is working on getting that accepted now.
I know its a bit apples & oranges, but do you think the SPARQL Uniform HTTP Protocol is relevant to the thread on rest-discuss?
I recently got aware of Fuseki, a SPARQL server. The claim on the
"It provides the REST-style SPARQL HTTP Update, and SPARQL Query and
SPARQL Update using the SPARQL protocol over HTTP."
I think the "hypermedia as the engine of application state" constraint
is not fulfilled, or?
- Bob Ferris wrote:
>Well, that's one of 'em. In REST circles, you'll encounter the phrase
> I think the "hypermedia as the engine of application state"
> constraint is not fulfilled, or?
"HTTP != REST" which means that REST is a subset of the things you can
do with HTTP, not the entire set of things you can do with HTTP. 99%
of "REST APIs" out there are really HTTP APIs, all hail the power of
- Eric J. Bowman wrote:
> Bob Ferris wrote:and weirdly HTTP is not a superset of REST, as in you can't do
>> I think the "hypermedia as the engine of application state"
>> constraint is not fulfilled, or?
> Well, that's one of 'em. In REST circles, you'll encounter the phrase
> "HTTP != REST" which means that REST is a subset of the things you can
> do with HTTP, not the entire set of things you can do with HTTP. 99%
> of "REST APIs" out there are really HTTP APIs, all hail the power of
> the buzzword...
everything with HTTP that REST indicates you could/should (the mismatches)
- Nathan wrote:
>Absolutely. But, assuming a RESTful design, HTTP could be replaced in
> and weirdly HTTP is not a superset of REST, as in you can't do
> everything with HTTP that REST indicates you could/should (the
> fair comment?
the future with Waka, HTTP 2, or whatever -- clearing up those
mismatches inherent in HTTP 1.1 without actually changing the API.