Loading ...
Sorry, an error occurred while loading the content.
 

Why SPARQL endpoints aren't even remotely RESTful.

Expand Messages
  • Eric J. Bowman
    ... When I go to that page, I see not even a clue about the nature of the interface, other than that I ll need the out-of-band knowledge of some query language
    Message 1 of 22 , Feb 2, 2011
      Danny Ayers wrote:
      >
      > For example, if I go to:
      >
      > http://api.talis.com/stores/bbc-backstage/services/sparql
      >
      > and enter the query :
      >
      > select ?s where { ?s ?p ?o }
      > limit 10
      >
      > then click the "Search" button, I get a bunch of results in SPARQL
      > results format.
      >

      When I go to that page, I see not even a clue about the nature of the
      interface, other than that I'll need the out-of-band knowledge of some
      query language to use it. Where are the instructions for how to
      transition to the next application state, given *any* goal? This is
      indeed an RPC endpoint, not a hypertext API.

      The corollary is to run your weblog by providing a textbox which takes a
      SQL query, instead of encapsulating SQL within a hypertext interface
      (i.e. running WordPress). This is precisely what Roy is talking about,
      in his final bullet point, here:

      http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven

      Also, from the comments to that post:

      "When I say hypertext, I mean the simultaneous presentation of
      information and controls such that the information becomes the
      affordance through which the user (or automaton) obtains choices and
      selects actions... Machines can follow links when they understand the
      data format and relationship types... It is the same basic issue as
      with human communication: we will always need a common vocabulary to
      make sense of it. Exposing that vocabulary in the representations makes
      it easy to learn and be adopted by others."

      The data format is HTML, which says nothing about SPARQL, and there is
      no link relation. So the vocabulary isn't exposed in hypertext at all.
      The interaction is not based on the information presented in the
      hypertext, therefore it is being driven by out-of-band information.
      Google's search API (though not entirely RESTful) accepts keywords,
      with a syntax defined here:

      http://www.google.com/advanced_search

      It should be obvious that there's a very fundamental difference between
      Google's homepage, and the advanced_search page -- the former relies on
      out-of-band information to add '&num=10', the latter makes it a RESTful
      hypertext control; SPARQL endpoints don't even encode number of results
      as a name/value pair, instead making it part of one opaque search
      phrase (limit+10 tacked on at the end) and needlessly complicating the
      issue of input validation on both the client and the server sides.

      Taking Google's advanced_search interface a little further, RDFa could
      be used to describe the "results per page" control, and type it as an
      integer. A more advanced forms language could express the range that
      the server will accept. This allows client-side input validation.
      Google allows any value; what would make more sense would be to take
      their form control literally -- limiting results-per-page to a set
      number of values improves cacheability.

      >
      > then click the "Search" button, I get a bunch of results in SPARQL
      > results format.
      >

      No, it returns a representation as application/xml, which means I need
      to sniff in order to determine that it's a SPARQL result. To meet the
      self-descriptive messaging constraint of REST, the results would
      properly be sent as application/sparql-results+xml, but making that
      change alone won't make the API RESTful. As the results from an actual
      hypertext API, it makes a fine media type, although I'd personally tack
      on an XML PI to call some XSLT to transform it into HTML, assuming my
      hypertext interface was also HTML.

      >
      > and enter the query :
      >
      > select ?s where { ?s ?p ?o }
      > limit 10
      >

      How do I know what to enter, when instead of entering keywords for a
      search, I have to enter a query formatted in a manner not afforded
      through hypertext controls? A REST API would have one hypertext
      control for select=, providing me with the options the server has
      implemented. Instead of making users guess at what namespaces are
      supported, a REST API would provide that list as a hypertext control.
      The server tells the user-agent the parameters of the API, such that
      the user-agent only needs to fill in the search terms (keywords, not
      instructions, particularly not instructions which amount to tunneling a
      custom method like CONSTRUCT over POST).

      *That's* what I mean by providing instructions for how to execute a
      state transition, not urlencoding an opaque query language and letting
      the server sort it out. The goal of a REST API is not to encode query
      languages as URIs this way, it's to abstract away such implementation
      details behind a generic interface. No (reasonable) CMS based on SQL
      presents SQL queries as URIs or in hypertext, that implementation
      detail is abstracted away behind the interface, which is exactly how
      SPARQL can be made RESTful (as opposed to providing non-hypertext-API
      endpoints). The server converts the request into a SPARQL query for a
      back-end system in REST, as opposed to exposing a SPARQL endpoint -- no
      different from how SQL is handled in REST APIs.

      There is one way I can think of to use SPARQL queries in a REST app,
      which is to POST or PUT a representation as application/sparql-query to
      some URI. Dereferencing that URI executes the query as a stored
      procedure, returning application/sparql-results+xml by default, but
      can also return the original query with Accept: application/sparql-
      query. I've used the eXist DB this way, creating cells containing
      XQuery, which is a nice way to create a Web app from an XML store.

      -Eric
    • Bob Ferris
      Hi Eric, thanks a lot for clarification the SPARQL-to-REST relation. So I can conclude that SPARQL endpoint/SPARQL query interface at least alá the
      Message 2 of 22 , Feb 2, 2011
        Hi Eric,

        thanks a lot for clarification the SPARQL-to-REST relation. So I can
        conclude that SPARQL endpoint/SPARQL query interface at least alá the
        advanced_search of Google can be RESTful. I didn't think that needs a
        separate (query and/or) result media type, since one is able to
        serialize such results also into representation formats of RDF e.g., RDFa.
        The thing I had and have always in mind was, of course, a more advanced
        query interface than a simple text box (so, sorry that this obviously
        causes misinterpretations). Even an interface like that of the
        advanced_search of Google isn't quite comfortable, or? I rather can
        imagine a kind of faceted browsing interface to formulate a query, where
        the end user didn't really get in touch with the statements behind. This
        depends of course on the specific application domain, but generally one
        often needs such contexts like time or place. So selecting an
        appropriated time interval on a timeline interface, or selecting a
        place/area on a world map interface might be better opportunity, or?

        Cheers,


        Bob


        Am 02.02.2011 15:40, schrieb Eric J. Bowman:
        > Danny Ayers wrote:
        >>
        >> For example, if I go to:
        >>
        >> http://api.talis.com/stores/bbc-backstage/services/sparql
        >>
        >> and enter the query :
        >>
        >> select ?s where { ?s ?p ?o }
        >> limit 10
        >>
        >> then click the "Search" button, I get a bunch of results in SPARQL
        >> results format.
        >>
        >
        > When I go to that page, I see not even a clue about the nature of the
        > interface, other than that I'll need the out-of-band knowledge of some
        > query language to use it. Where are the instructions for how to
        > transition to the next application state, given *any* goal? This is
        > indeed an RPC endpoint, not a hypertext API.
        >
        > The corollary is to run your weblog by providing a textbox which takes a
        > SQL query, instead of encapsulating SQL within a hypertext interface
        > (i.e. running WordPress). This is precisely what Roy is talking about,
        > in his final bullet point, here:
        >
        > http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
        >
        > Also, from the comments to that post:
        >
        > "When I say hypertext, I mean the simultaneous presentation of
        > information and controls such that the information becomes the
        > affordance through which the user (or automaton) obtains choices and
        > selects actions... Machines can follow links when they understand the
        > data format and relationship types... It is the same basic issue as
        > with human communication: we will always need a common vocabulary to
        > make sense of it. Exposing that vocabulary in the representations makes
        > it easy to learn and be adopted by others."
        >
        > The data format is HTML, which says nothing about SPARQL, and there is
        > no link relation. So the vocabulary isn't exposed in hypertext at all.
        > The interaction is not based on the information presented in the
        > hypertext, therefore it is being driven by out-of-band information.
        > Google's search API (though not entirely RESTful) accepts keywords,
        > with a syntax defined here:
        >
        > http://www.google.com/advanced_search
        >
        > It should be obvious that there's a very fundamental difference between
        > Google's homepage, and the advanced_search page -- the former relies on
        > out-of-band information to add '&num=10', the latter makes it a RESTful
        > hypertext control; SPARQL endpoints don't even encode number of results
        > as a name/value pair, instead making it part of one opaque search
        > phrase (limit+10 tacked on at the end) and needlessly complicating the
        > issue of input validation on both the client and the server sides.
        >
        > Taking Google's advanced_search interface a little further, RDFa could
        > be used to describe the "results per page" control, and type it as an
        > integer. A more advanced forms language could express the range that
        > the server will accept. This allows client-side input validation.
        > Google allows any value; what would make more sense would be to take
        > their form control literally -- limiting results-per-page to a set
        > number of values improves cacheability.
        >
        >>
        >> then click the "Search" button, I get a bunch of results in SPARQL
        >> results format.
        >>
        >
        > No, it returns a representation as application/xml, which means I need
        > to sniff in order to determine that it's a SPARQL result. To meet the
        > self-descriptive messaging constraint of REST, the results would
        > properly be sent as application/sparql-results+xml, but making that
        > change alone won't make the API RESTful. As the results from an actual
        > hypertext API, it makes a fine media type, although I'd personally tack
        > on an XML PI to call some XSLT to transform it into HTML, assuming my
        > hypertext interface was also HTML.
        >
        >>
        >> and enter the query :
        >>
        >> select ?s where { ?s ?p ?o }
        >> limit 10
        >>
        >
        > How do I know what to enter, when instead of entering keywords for a
        > search, I have to enter a query formatted in a manner not afforded
        > through hypertext controls? A REST API would have one hypertext
        > control for select=, providing me with the options the server has
        > implemented. Instead of making users guess at what namespaces are
        > supported, a REST API would provide that list as a hypertext control.
        > The server tells the user-agent the parameters of the API, such that
        > the user-agent only needs to fill in the search terms (keywords, not
        > instructions, particularly not instructions which amount to tunneling a
        > custom method like CONSTRUCT over POST).
        >
        > *That's* what I mean by providing instructions for how to execute a
        > state transition, not urlencoding an opaque query language and letting
        > the server sort it out. The goal of a REST API is not to encode query
        > languages as URIs this way, it's to abstract away such implementation
        > details behind a generic interface. No (reasonable) CMS based on SQL
        > presents SQL queries as URIs or in hypertext, that implementation
        > detail is abstracted away behind the interface, which is exactly how
        > SPARQL can be made RESTful (as opposed to providing non-hypertext-API
        > endpoints). The server converts the request into a SPARQL query for a
        > back-end system in REST, as opposed to exposing a SPARQL endpoint -- no
        > different from how SQL is handled in REST APIs.
        >
        > There is one way I can think of to use SPARQL queries in a REST app,
        > which is to POST or PUT a representation as application/sparql-query to
        > some URI. Dereferencing that URI executes the query as a stored
        > procedure, returning application/sparql-results+xml by default, but
        > can also return the original query with Accept: application/sparql-
        > query. I've used the eXist DB this way, creating cells containing
        > XQuery, which is a nice way to create a Web app from an XML store.
        >
        > -Eric
      • Eric J. Bowman
        ... You re welcome. As per usual, nobody has to read my long-winded explanations, but it does help me to write them... ... The reason this causes
        Message 3 of 22 , Feb 2, 2011
          Bob Ferris wrote:
          >
          > thanks a lot for clarification the SPARQL-to-REST relation.
          >

          You're welcome. As per usual, nobody has to read my long-winded
          explanations, but it does help me to write them...

          >
          > The thing I had and have always in mind was, of course, a more
          > advanced query interface than a simple text box (so, sorry that this
          > obviously causes misinterpretations).
          >

          The reason this causes misinterpretation, is that the nature of the
          hypertext controls makes all the difference in the world as to whether
          or not an API is RESTful. I've not seen an example of a SPARQL
          endpoint that isn't just a textarea, so I assume that's what's meant by
          SPARQL endpoint. "RESTful SPARQL API" is non-sequitir to me, because
          if I were to implement SPARQL, none of its syntax would leak into the
          URIs or the representations (except to return application/sparql-result
          +xml if negotiated for) -- I'd have an RDF-aware "RESTful search API".

          >
          > So I can conclude that SPARQL endpoint/SPARQL query interface at
          > least alá the advanced_search of Google can be RESTful.
          >

          Right, the problem isn't creating an interface which *accepts* SPARQL
          syntax; the problem is creating an interface *for* SPARQL syntax. The
          drawback is that it takes some more work to realize the concept of
          cross-site queries, than just knowing the SPARQL endpoint address for
          each site.

          A hypertext control for the number of results to return, might be
          marked up differently on each site. RDFa allows those controls to
          describe themselves using a common vocabulary (which doesn't yet exist)
          for gathering search data (including locations and dates). Manipulating
          the controls depends on the user-agent's understanding of that
          vocabulary, plus whatever forms language is used.

          Note that when I mention RDFa, I'm talking about a layer above REST
          that's completely optional. My goal is to have the same API for humans
          and machines, and I believe RDFa allows one representation to service
          both types of user. Anyway, RESTful search APIs (regardless of the
          technologies used to implement) with RDFa seems a more logical way
          forward to me, than SPARQL endpoints (which I have a gut feeling will
          lead to "SPARQL injection" attacks, too much "surface area" for me).

          >
          > I didn't think that needs a separate (query and/or) result media
          > type, since one is able to serialize such results also into
          > representation formats of RDF e.g., RDFa.
          >

          You're right. The SPARQL media types may come in handy in some cases,
          while being irrelevant in others, but all achieving the common goal of
          returning the same list of links for the same query. Meaning there's
          more than one format for representing the same resource, which is why
          we have conneg; and that SPARQL media types aren't a prerequisite for a
          RESTful API which happens to use SPARQL on the backend.

          >
          > Even an interface like that of the advanced_search of Google isn't
          > quite comfortable, or?
          >

          I chose Google as an example, to compare and contrast the homepage
          interface with the advanced interface. It could be more user-friendly,
          sure, but the point is that I've learned how to formulate queries
          without that interface, by using that interface -- it's a self-
          documenting API. I couldn't have learned Google search syntax from the
          homepage. SPARQL endpoints, as they currently exist, don't inform me
          how to formulate queries by using that interface (I'm expected to
          already know).

          >
          > I rather can imagine a kind of faceted browsing interface to
          > formulate a query, where the end user didn't really get in touch with
          > the statements behind.
          >

          Right; abstracting away the implementation details behind the interface
          is kinda the point. Or, "cool URIs don't change" (although URI design
          is only orthogonally related to REST). Searching a collection of cat
          photos for cats who look like Hitler, is a goal. If the implementation
          is a SPARQL endpoint which simply urlencodes the query, what happens to
          that URI when the system upgrades from SPARQL to (hypothetical) GLITR?

          Whereas abstracting away the specifics of SPARQL allows the backend to
          be changed, to construct a GLITR query from the same request instead of
          a SPARQL query -- without changing the hypertext, even, assuming a
          detailed interface (as opposed to 'enter SPARQL query here') and
          (optionally) common search-form vocabulary. Maybe GLITR has more
          options, but the data I'm looking for needs to be collected regardless
          of search language, so the API for collecting that data shouldn't need
          to be changed -- aka "REST APIs don't need versioning."

          Design for longevity -- any implementation detail can be swapped out
          without breaking the system, provided it's been properly decoupled.
          Coupling your URIs to your back-end query syntax, locks you into that
          choice (unless you figure out some hairy redirection algorithms).
          Implementation details, like SPARQL, should not impact your URI
          allocation scheme.

          >
          > This depends of course on the specific application domain, but
          > generally one often needs such contexts like time or place. So
          > selecting an appropriated time interval on a timeline interface, or
          > selecting a place/area on a world map interface might be better
          > opportunity, or?
          >

          Yes. Assume the collection of cat photos includes birth/death dates.
          XForms processors include a nifty pop-up calendar date-picker for any
          field that's XSD-typed as a date. By manipulating the form, I discover
          the URI which returns "all living Hitler cats" based on choosing
          today's date and entering "hitler" as a keyword, etc. Or, I can just
          enter a date manually -- the nature of the control isn't important,
          only the nature of the data it collects.

          This self-documenting API has now given me all the information required
          to create a dynamic resource on an unrelated domain (serendipitous re-
          use), i.e. a dynamic "all living Hitler cats" Web page which uses Code-
          on-Demand to get the current date from the user-agent, and uses that
          date to build the query URI. I can also learn, by driving the form, how
          to highlight applicable cats on their birthdays.

          Why should I have to re-code that page every couple of years when the
          service changes technology and breaks its old URIs? While "cool URIs
          don't change" isn't a constraint, following REST does tend to get you
          mostly there by encapsulating whatever back-end technologies are used,
          instead of wearing them on the ol' sleeve.

          Automating a client to search multiple collections of cat photos is a
          problem, what I'm saying is that the solution needs to be approached
          from the perspective of the hypertext constraint (being of the Web),
          rather than the perspective of a common URI allocation scheme based on
          a query language (fighting the Web). The problem of mapping hypertext
          controls into query languages, shouldn't involve the URI as a solution.

          -Eric
        • Nathan
          ... Indeed :) one little question though, what happens when somebody GETs the URI? For example, given such a scenario I d quite like to send people back some
          Message 4 of 22 , Feb 2, 2011
            Eric J. Bowman wrote:
            > There is one way I can think of to use SPARQL queries in a REST app,
            > which is to POST or PUT a representation as application/sparql-query to
            > some URI. Dereferencing that URI executes the query as a stored
            > procedure, returning application/sparql-results+xml by default, but
            > can also return the original query with Accept: application/sparql-
            > query.

            Indeed :) one little question though, what happens when somebody GETs
            the URI? For example, given such a scenario I'd quite like to send
            people back some HTML, with a form in it, that allowed them to run
            test SPARQL queries and get back the "raw results", say by putting the
            query in a form element and submitting the form. Sound feasible /
            RESTful? if so, POST/PUT or GET?

            ps: a little confused after reading the above "one way I can think og
            to use SPARQL queries in a REST app, which is to POST or.." and the
            mail you sent immediately before it saying "Please don't confuse a
            post about how POST isn't unRESTful, as saying that it's ever even
            remotely OK to use POST as a retrieval method." - I'm probably missing
            something obvious here, or perhaps a subtlety in interpetation.

            Best,

            Nathan
          • Danny Ayers
            ... The query box form is just one way of approaching a SPARQL endpoint - essentially just a debugging tool - and absolutely not typical of the kind of
            Message 5 of 22 , Feb 2, 2011
              On 2 February 2011 15:40, Eric J. Bowman <eric@...> wrote:
               

              Danny Ayers wrote:
              >
              > For example, if I go to:
              >
              > http://api.talis.com/stores/bbc-backstage/services/sparql
              >
              [snip]

              When I go to that page, I see not even a clue about the nature of the
              interface, other than that I'll need the out-of-band knowledge of some
              query language to use it. Where are the instructions for how to
              transition to the next application state, given *any* goal? This is
              indeed an RPC endpoint, not a hypertext API.

               
              The query box form is just one way of approaching a SPARQL endpoint - essentially just a debugging tool - and absolutely not typical of the kind of interfaces to be found in systems that use SPARQL. As I said before, it's misleading. I personally consider the behaviour of the query box as being RESTful, but arguing over that particular aspect is really missing the whole point of the endpoint.

              I've left it a bit late to go through your points one by one, I'll re-read tomorrow. But for now I'll leave you with this to look at:

              http://reference.data.gov.uk/doc/department/dft

              The top half of the page should tick at least some of your boxes regarding hypertext. But the work is done by a SPARQL endpoint, as you can see if you scroll to the bottom of the page. Ok, there's a thin presentation layer on top, but basically it's just mapping nice-looking URIs to their ugly SPARQL counterparts, and formatting the ugly results so they look ok in a HTML browser.

              If you copy & paste the query into the form at:

              http://services.data.gov.uk/reference/sparql

              you can see the ugly versions.

              The pretty/ugly URIs and the pretty/ugly formats are effectively isomorphic, the difference being that the pretty versions are tailored for a regular HTML browser with a human sat in front of it, with the aid of a bit of JSON/Javascript. HTML is a hypertext format by virtue of a user agent (usually a regular Web browser) being able to interpret the links in it as a means to the transfer of state via representations. The same goes for any other format - and the browser isn't the only kind of agent.

              btw, the link to the endpoint on the Department of Transport page is broken, so I got that URI by looking at the source. Alas there I discovered the form in the page uses POST for the query, which is absolutely inexcusable, especially given that a GET here yields the same results. I'll be having a word with someone about that! (The Linked Data API is still being drafted, see http://code.google.com/p/linked-data-api/ ).

              Cheers,
              Danny.





              --
              http://danny.ayers.name

            • Mark Baker
              I would say that the proper criticism here isn t that SPARQL isn t RESTful, nor that it should be, but instead that the potentially expensive queries SPARQL
              Message 6 of 22 , Feb 2, 2011
                I would say that the proper criticism here isn't that SPARQL isn't
                RESTful, nor that it should be, but instead that the potentially
                expensive queries SPARQL enables are simply not suitable across trust
                boundaries. See (including the comments);

                http://www.markbaker.ca/blog/2006/08/sparql-useful-but-not-a-game-changer/

                Mark.
              • Danny Ayers
                PS. I discovered the form ... Issue reported: http://code.google.com/p/linked-data-api/issues/detail?id=10 -- http://danny.ayers.name
                Message 7 of 22 , Feb 3, 2011
                  PS.

                  I discovered the form
                  > in the page uses POST for the query, which is absolutely inexcusable,
                  > especially given that a GET here yields the same results. I'll be having a
                  > word with someone about that! (The Linked Data API is still being drafted,
                  > see http://code.google.com/p/linked-data-api/ ).

                  Issue reported:
                  http://code.google.com/p/linked-data-api/issues/detail?id=10



                  --
                  http://danny.ayers.name
                • Alistair Miles
                  ... I think this is a key issue too, and it s something we were aware of when I did the work on openflydata.org in 2009. We explored some ideas around
                  Message 8 of 22 , Feb 3, 2011
                    On Wed, Feb 02, 2011 at 09:31:39PM -0500, Mark Baker wrote:
                    > I would say that the proper criticism here isn't that SPARQL isn't
                    > RESTful, nor that it should be, but instead that the potentially
                    > expensive queries SPARQL enables are simply not suitable across trust
                    > boundaries. See (including the comments);

                    I think this is a key issue too, and it's something we were aware of when
                    I did the work on openflydata.org in 2009. We explored some ideas around
                    restricting the query language features exposed by an endpoint, to prevent some
                    of the more obvious denial-of-service type vulnerabilities, which was part
                    of the reason why we ended up rolling our own SPARQL protocol implementation
                    [1]. There's a bit more discussion in the paper at [2]. Of course, even with
                    restricted language features, you can still write hard queries if you know
                    how, so this isn't a perfect strategy.

                    What we really wanted to be able to do was be able to place a hard limit
                    on the amount of resources any one query could consume. A simple way to do
                    this might be to kill queries that took longer than X seconds, a la simpledb
                    [3]. My colleage Graham Klyne and I had a chat with Andy Seaborne about doing
                    this with Jena TDB, which wasn't possible at the time, and Andy seemed to
                    think it was doable (?) but there were details around how TDB executed queries
                    and also how TDB's various optimisers worked that I didn't fully understand,
                    and we never had the time to follow this through. I haven't done any recent
                    work on SPARQL, so it may be that there are query engines out there that
                    support this kind of thing out of the box.

                    So I guess I'm saying, you may be right, but I wouldn't discount the viability
                    of open sparql endpoints just yet, I think the jury's still out.

                    Cheers

                    Alistair

                    [1] http://code.google.com/p/sparqlite/
                    [2] http://dx.doi.org/10.1016/j.jbi.2010.04.004
                    [3] http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/index.html?SDBLimits.html

                    --
                    Alistair Miles
                    Head of Epidemiological Informatics
                    Centre for Genomics and Global Health <http://cggh.org>
                    The Wellcome Trust Centre for Human Genetics
                    Roosevelt Drive
                    Oxford
                    OX3 7BN
                    United Kingdom
                    Web: http://purl.org/net/aliman
                    Email: alimanfoo@...
                    Tel: +44 (0)1865 287669
                  • Danny Ayers
                    ... On : SPARQL is likely not going to enable new kinds of applications - it s hard to disagree. However it does enable new ways of constructing
                    Message 9 of 22 , Feb 3, 2011
                      On 3 February 2011 03:31, Mark Baker <distobj@...> wrote:
                       

                      I would say that the proper criticism here isn't that SPARQL isn't
                      RESTful, nor that it should be, but instead that the potentially
                      expensive queries SPARQL enables are simply not suitable across trust
                      boundaries. See (including the comments);

                      http://www.markbaker.ca/blog/2006/08/sparql-useful-but-not-a-game-changer/


                      On : "SPARQL is likely not going to enable new kinds of applications" - it's hard to disagree. However it does enable new ways of constructing applications, and that, while maybe not a game changer in the big picture, is a good step forward. Even if it was straight RPC, being able to access the data directly is very useful (I don't believe it is RPC, the query URIs are effectively just a systematic convention for mapping of URIs to the information space).

                      Examples of new ways of constructing applications are presented in Leigh Dodds' screencasts at:
                      http://www.talis.com/platform/demos/

                      To save sitting through them (though they are watchable :)  the BBC Data screencast onscreen there describes a simple app to browse data relating to reviews and reviewers built using the following approach:
                      * public data from the BBC site is harvested and placed in an online triplestore (note *not* screen-scraped, they publish machine-friendly RDF with each page in their Music section)
                      * certain preset queries are created for patterns of interest in the new application
                      * those queries are given a browser/user-friendly facade

                      Having said all that, the trust boundary question is a big one.

                      I don't think the problem is necessarily to do with expensive queries - for the majority of applications the shape of the required data and suitable sources will be know in advance, open-ended querying (/crawling) isn't necessary. The appropriate data can be harvested (via site crawling or through running CONSTRUCT queries if a SPARQL endpoint is available), filtered (probably again using CONSTRUCT) and cached in a secondary store which will act as a cache.

                      A slightly harder part is the issue of combining data from multiple diverse sources whilst retaining adequate provenance information to support 'trust management'. Most current setups are geared towards fairly large named graphs, rather than the little bitty ones you'd need for fine-grained processing. But quite a lot of work has already been done on this general area (there was a W3C incubator group, report: http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/).

                      A much harder part is what you get when you throw access control into the mix. While some of the individual technologies seem to have a lot of potential (notably FOAF+SSL WebID : http://esw.w3.org/WebID), I don't think there's a compelling story yet on how they would work with multiple diverse data sources, which may each have their own authentication/authorization requirements.

                      Cheers,
                      Danny.



                      --
                      http://danny.ayers.name

                    • Nathan
                      ... this supposes that a SPARQL query engine is always positioned on the server side - it s positioned at the edges of the network, sometimes on the client
                      Message 10 of 22 , Feb 3, 2011
                        Mark Baker wrote:
                        > I would say that the proper criticism here isn't that SPARQL isn't
                        > RESTful, nor that it should be, but instead that the potentially
                        > expensive queries SPARQL enables are simply not suitable across trust
                        > boundaries. See (including the comments);
                        >
                        > http://www.markbaker.ca/blog/2006/08/sparql-useful-but-not-a-game-changer/

                        this supposes that a SPARQL query engine is always positioned on the
                        server side
                        - it's positioned at the edges of the network, sometimes on the
                        client (backed by an HTTP cache and conditional GETs), sometimes it's
                        a shared client accessible by HTTP GETs (w/ optional caching) and
                        often accessible by "normal" URIs (no query string or the like), and
                        sometimes "on the server".

                        this supposes that it's used for private data over a public interface
                        - that's orthogonal, when people have private data their they can
                        secure access by any kind of auth* / use HTTP+TLS. Similarly, often
                        the data /is/ open public data, there's a term.. "linked *open* data"

                        Now, I'm not saying "SPARQL" is perfect, but it's completely
                        orthogonal to REST - how you implement, position and expose SPARQL is
                        not though, but don't think for a second that "sparql is always on the
                        server side, never cached, always unsecured and always uses post"
                        because that's completely wrong.

                        Best,

                        Nathan
                      • Nathan
                        ... Danny, you simply place the query engine on the client side, operating over a web of linked data, using conditional GETs and HTTP caching, each data source
                        Message 11 of 22 , Feb 3, 2011
                          Danny Ayers wrote:
                          > A much harder part is what you get when you throw access control into the
                          > mix. While some of the individual technologies seem to have a lot of
                          > potential (notably FOAF+SSL WebID : http://esw.w3.org/WebID), I don't think
                          > there's a compelling story yet on how they would work with multiple diverse
                          > data sources, which may each have their own authentication/authorization
                          > requirements.

                          Danny, you simply place the query engine on the client side, operating
                          over a web of linked data, using conditional GETs and HTTP caching,
                          each data source can be ACL controlled in a granular fashion that way,
                          and it's /very/ network friendly :)

                          ps: not many people do this currently, but you can, virtuoso for
                          example enables this, and there are js implementations in the works to
                          encourage this pattern. (full read write ACL controlled web of linked
                          data is still at proto stage)

                          Best,

                          Nathan
                        • Danny Ayers
                          ... A very point, a lot of good stuff can be done in a client (agent), although it s a slightly unfamiliar pattern for developers used to seeing the browser as
                          Message 12 of 22 , Feb 3, 2011
                            On 3 February 2011 10:21, Nathan <nathan@...> wrote:

                            > Danny, you simply place the query engine on the client side, operating over
                            > a web of linked data, using conditional GETs and HTTP caching, each data
                            > source can be ACL controlled in a granular fashion that way, and it's /very/
                            > network friendly :)

                            A very point, a lot of good stuff can be done in a client (agent),
                            although it's a slightly unfamiliar pattern for developers used to
                            seeing the browser as the only client. (My personal view of a 21st
                            century intelligent agent - agent in the AI sense - is that of a
                            relatively stupid little unit composed of a HTTP client, (access from)
                            a HTTP server a little bit of code/wiring to express its business
                            rules).

                            My reservation on the ACL front would be that currently a great deal
                            of per-source manual configuration would be required to set something
                            like this up (and modify it as requirements and sources evolve).

                            > ps: not many people do this currently, but you can, virtuoso for example
                            > enables this, and there are js implementations in the works to encourage
                            > this pattern. (full read write ACL controlled web of linked data is still at
                            > proto stage)

                            Right. I wasn't aware of how far Virtuoso had got on this, but it's
                            good to hear that work is in progress.

                            Cheers,
                            DAnny.



                            --
                            http://danny.ayers.name
                          • Bob Ferris
                            Hi, at all: I thought we pointed already most of the mentioned parts out a bit earlier, or? Cf. - http://tech.groups.yahoo.com/group/rest-discuss/message/17279
                            Message 13 of 22 , Feb 3, 2011
                              Hi,

                              at all: I thought we pointed already most of the mentioned parts out a
                              bit earlier, or? Cf.

                              - http://tech.groups.yahoo.com/group/rest-discuss/message/17279
                              - http://tech.groups.yahoo.com/group/rest-discuss/message/17258
                              - http://tech.groups.yahoo.com/group/rest-discuss/message/17264
                              - http://tech.groups.yahoo.com/group/rest-discuss/message/17266

                              ;)

                              Apologies for repeating myself here, but I think we shouldn't go round
                              in circles, or? - Although, yes such conversation is always a bit
                              difficult. It's hard to deliver the intended meaning of a message from
                              its sender to its receiver(s).
                              Anyway, thanks a lot for having that nice discussion here.

                              Cheers,


                              Bob

                              PS: please keep attention to the referenced sources in the posts
                              PPS: the third reference in the second post should be
                              http://purl.org/ontology/is/core# or
                              http://infoserviceonto.smiy.org/2010/06/22/welcome/ ;) ("legacy system"
                              (existing non-Semantic-Web system) provenance information is crucial at
                              the moment)


                              Am 03.02.2011 10:41, schrieb Danny Ayers:
                              > On 3 February 2011 10:21, Nathan<nathan@...> wrote:
                              >
                              >> Danny, you simply place the query engine on the client side, operating over
                              >> a web of linked data, using conditional GETs and HTTP caching, each data
                              >> source can be ACL controlled in a granular fashion that way, and it's /very/
                              >> network friendly :)
                              >
                              > A very point, a lot of good stuff can be done in a client (agent),
                              > although it's a slightly unfamiliar pattern for developers used to
                              > seeing the browser as the only client. (My personal view of a 21st
                              > century intelligent agent - agent in the AI sense - is that of a
                              > relatively stupid little unit composed of a HTTP client, (access from)
                              > a HTTP server a little bit of code/wiring to express its business
                              > rules).
                              >
                              > My reservation on the ACL front would be that currently a great deal
                              > of per-source manual configuration would be required to set something
                              > like this up (and modify it as requirements and sources evolve).
                              >
                              >> ps: not many people do this currently, but you can, virtuoso for example
                              >> enables this, and there are js implementations in the works to encourage
                              >> this pattern. (full read write ACL controlled web of linked data is still at
                              >> proto stage)
                              >
                              > Right. I wasn't aware of how far Virtuoso had got on this, but it's
                              > good to hear that work is in progress.
                            • Nathan
                              ... Yes but not everybody has time to read lots of long posts, so a quick summary can sometimes suffice, together with taking forks in discussion off list.
                              Message 14 of 22 , Feb 3, 2011
                                Bob Ferris wrote:
                                > at all: I thought we pointed already most of the mentioned parts out a
                                > bit earlier, or? Cf.

                                Yes but not everybody has time to read lots of long posts, so a quick
                                summary can sometimes suffice, together with taking forks in
                                discussion off list. Nothing wrong with that.

                                It is good to have fuller discussion on the issues referenced for the
                                archives and anybody following though :)

                                Best,

                                Nathan

                                ps: yes, I'm noting the irony in that I can often write loads of long
                                posts in quick succession ;)
                              • Eric J. Bowman
                                ... Dereferencing that URI executes the query... ... You re confusing me. ;-) My scenario executes one stored SPARQL query (by default, unless Accept:
                                Message 15 of 22 , Feb 3, 2011
                                  Nathan wrote:
                                  >
                                  > Eric J. Bowman wrote:
                                  > > There is one way I can think of to use SPARQL queries in a REST app,
                                  > > which is to POST or PUT a representation as
                                  > > application/sparql-query to some URI. Dereferencing that URI
                                  > > executes the query as a stored procedure, returning
                                  > > application/sparql-results+xml by default, but can also return the
                                  > > original query with Accept: application/sparql- query.
                                  >
                                  > Indeed :) one little question though, what happens when somebody GETs
                                  > the URI?
                                  >

                                  "Dereferencing that URI executes the query..."

                                  >
                                  > For example, given such a scenario I'd quite like to send people back
                                  > some HTML, with a form in it, that allowed them to run test SPARQL
                                  > queries and get back the "raw results", say by putting the query in a
                                  > form element and submitting the form. Sound feasible / RESTful? if
                                  > so, POST/PUT or GET?
                                  >

                                  You're confusing me. ;-) My scenario executes one stored SPARQL query
                                  (by default, unless Accept: application/sparql-query) at a fixed URI.
                                  POST or PUT can create that URI (depending on whether the user-agent or
                                  the origin server assigns the URI), by uploading the query as a file.
                                  PUT may be used to replace the query, i.e. edit the file. Note that
                                  PUT will only Allow: application/sparql-query -- you can't edit the
                                  result, PUT that back, and expect the server to reformulate the query.

                                  Standard REST design pattern, I've used it with PHP, XQuery, SSJS, JSP,
                                  ASP... not always self-descriptively, as PHP etc. lack media types, and
                                  always access-restricted for methods other than GET/HEAD/OPTIONS.

                                  You're saying you want GET to return HTML results with a form? Fine,
                                  add text/html to the conneg mix, return the results with a form in
                                  that representation, pre-fill the textarea with the current raw SPARQL
                                  query, and instruct the user-agent to PUT application/sparql-query
                                  to the URI upon submission.

                                  >
                                  > ps: a little confused after reading the above "one way I can think og
                                  > to use SPARQL queries in a REST app, which is to POST or.." and the
                                  > mail you sent immediately before it saying "Please don't confuse a
                                  > post about how POST isn't unRESTful, as saying that it's ever even
                                  > remotely OK to use POST as a retrieval method." - I'm probably
                                  > missing something obvious here, or perhaps a subtlety in
                                  > interpetation.
                                  >

                                  Yes, there's a nuance here that will lead some folks to believe that my
                                  example is no different than Mike's and resembles the SPARQL endpoint
                                  I'm griping about, and conclude that I've contradicted myself when I
                                  haven't. I'm not using POST to execute queries, SPARQL syntax hasn't
                                  leaked out into my URIs, and my API is *somewhat* self-documenting, in
                                  that the query isn't entirely opaque when presented with the results it
                                  generates.

                                  -Eric
                                • Mark Baker
                                  Hi Alistair, ... That would certainly work in the sense of bringing the cost down, but it would be a shadow of a SPARQL endpoint from the point of view of
                                  Message 16 of 22 , Feb 3, 2011
                                    Hi Alistair,

                                    On Thu, Feb 3, 2011 at 3:52 AM, Alistair Miles <alimanfoo@...> wrote:
                                    > What we really wanted to be able to do was be able to place a hard limit
                                    > on the amount of resources any one query could consume. A simple way to do
                                    > this might be to kill queries that took longer than X seconds, a la simpledb
                                    > [3]. My colleage Graham Klyne and I had a chat with Andy Seaborne about doing
                                    > this with Jena TDB, which wasn't possible at the time, and Andy seemed to
                                    > think it was doable (?) but there were details around how TDB executed queries
                                    > and also how TDB's various optimisers worked that I didn't fully understand,
                                    > and we never had the time to follow this through. I haven't done any recent
                                    > work on SPARQL, so it may be that there are query engines out there that
                                    > support this kind of thing out of the box.

                                    That would certainly work in the sense of bringing the cost down, but
                                    it would be a shadow of a SPARQL endpoint from the point of view of
                                    client expectations, no? Its proper functioning would be dependent on
                                    far too many variables that the client has no control over.

                                    That could be remedied by the publisher documenting a set of queries
                                    which it can guarantee will complete in a reasonable time, because it
                                    has optimized specifically for them (indexes, caching, etc..) ... but
                                    then that's exactly what they'd be doing if they put an HTTP interface
                                    in front of that data.

                                    > So I guess I'm saying, you may be right, but I wouldn't discount the viability
                                    > of open sparql endpoints just yet, I think the jury's still out.

                                    I'd be happy to be proven wrong because it would clearly be awesome to
                                    be able to use SPARQL over the 'net. Alas, everything I know about
                                    the Web tells me I'm not.

                                    Mark.
                                  • Alistair Miles
                                    Hi Mark, ... That s a good point. But I do wonder if there is still a middle-ground worth a bit of exploration. By that I mean, for any given endpoint, based
                                    Message 17 of 22 , Feb 4, 2011
                                      Hi Mark,

                                      On Thu, Feb 03, 2011 at 03:25:42PM -0500, Mark Baker wrote:
                                      > Hi Alistair,
                                      >
                                      > On Thu, Feb 3, 2011 at 3:52 AM, Alistair Miles <alimanfoo@...> wrote:
                                      > > What we really wanted to be able to do was be able to place a hard limit
                                      > > on the amount of resources any one query could consume. A simple way to do
                                      > > this might be to kill queries that took longer than X seconds, a la simpledb
                                      > > [3]. My colleage Graham Klyne and I had a chat with Andy Seaborne about doing
                                      > > this with Jena TDB, which wasn't possible at the time, and Andy seemed to
                                      > > think it was doable (?) but there were details around how TDB executed queries
                                      > > and also how TDB's various optimisers worked that I didn't fully understand,
                                      > > and we never had the time to follow this through. I haven't done any recent
                                      > > work on SPARQL, so it may be that there are query engines out there that
                                      > > support this kind of thing out of the box.
                                      >
                                      > That would certainly work in the sense of bringing the cost down, but
                                      > it would be a shadow of a SPARQL endpoint from the point of view of
                                      > client expectations, no? Its proper functioning would be dependent on
                                      > far too many variables that the client has no control over.
                                      >
                                      > That could be remedied by the publisher documenting a set of queries
                                      > which it can guarantee will complete in a reasonable time, because it
                                      > has optimized specifically for them (indexes, caching, etc..) ... but
                                      > then that's exactly what they'd be doing if they put an HTTP interface
                                      > in front of that data.

                                      That's a good point. But I do wonder if there is still a middle-ground worth
                                      a bit of exploration. By that I mean, for any given endpoint, based on my
                                      experience with Jena TDB and the FlyBase dataset (~180m triples) [1] there
                                      will probably still be quite a large space of possible queries that will
                                      execute with low cost even without the service making indexing or caching
                                      optimisations specific to the data.

                                      E.g., try doing the example search at
                                      http://openflydata.org/flyui/build/apps/expressionbygenebatch/ with firebug
                                      open to see the underlying SPARQL queries. These queries are not trivial
                                      but usually complete in less than a few seconds (some are sub-second), and
                                      the endpoints are all hosted on a modest EC2 m1.small instance. No specific
                                      optimisations were made for any of these endpoints, beyond the use of the
                                      generic TDB statistics-based optimiser.

                                      (In case you were wondering, those are fruit fly embryos and testes you're
                                      looking at :)

                                      Don't get me wrong, I'm not trying to claim SPARQL will revolutionise the
                                      Web, and I haven't done any work with SPARQL since 2009, so I have nothing
                                      invested in it.

                                      But I do wonder if, for people who have an interesting dataset that they'd
                                      like to share with others, exposing their dataset via a SPARQL endpoint
                                      would be worthwhile, even if they limited resource usage. I.e., the data
                                      publisher would say, "here's my data, you can execute x queries per second,
                                      any queries longer than y seconds will get killed, here's some statistics
                                      about the data to help you figure out what's there, go explore". Any third
                                      parties then interested in re-using the data could try a few sparql queries to
                                      see if they were efficient, and if so, query the SPARQL endpoint directly in
                                      their mashup/... application. If the queries they needed turned out not to be
                                      so efficient, then they could begin a dialogue with the data provider about
                                      an HTTP interface that is optimised for a particular set of requirements,
                                      or they could harvest (via the SPARQL endpoint?), cache and index the data
                                      they need and do their own optimisation.

                                      I think this is especially interesting where a dataset is widely
                                      applicable. E.g., FlyBase stores reference data on the fruit fly genome,
                                      which is central to genomic research in Drosophila and is re-used in an
                                      extremely diverse range of applications. For FlyBase, they serve their
                                      community better by providing more flexible interfaces to their data, because
                                      they cannot possibly predict all requirements. (In fact, FlyBase currently
                                      provide an SQL endpoint on a best-effort basis, which anyone can use if you
                                      know where to find it.)

                                      And providing a query endpoint is a nice way of lowering the costs for third
                                      parties re-using your data. E.g., if it's a SPARQL endpoint, then re-using
                                      the data is as simple as writing a bit of Python/Perl/..., or putting
                                      some HTML and javascript on a web server, very low infrastructure costs
                                      or complexity. This is more relevant where there are lots of small-scale
                                      data re-users that don't have access to hosting infrastructure, which is
                                      particularly the case in biological research.

                                      Having said all that, I've found a fair few queries that SPARQL is just crap
                                      at, and will never work without data-specific indexing and caching. Graham
                                      Klyne got some funding recently work with the Jena guys to look at extending
                                      SPARQL endpoints with multiple data-specific indexes [2], that could be
                                      interesting, and I'm sure others are working in the same space.

                                      A few shades of grey worth talking about here?

                                      Cheers,

                                      Alistair

                                      [1] http://code.google.com/p/openflydata/wiki/FlyBaseMilestone3
                                      [2] http://code.google.com/p/milarq/

                                      --
                                      Alistair Miles
                                      Head of Epidemiological Informatics
                                      Centre for Genomics and Global Health <http://cggh.org>
                                      The Wellcome Trust Centre for Human Genetics
                                      Roosevelt Drive
                                      Oxford
                                      OX3 7BN
                                      United Kingdom
                                      Web: http://purl.org/net/aliman
                                      Email: alimanfoo@...
                                      Tel: +44 (0)1865 287669
                                    • Danny Ayers
                                      The Talis Platform (http://talis.com/platform) is a Software as a Service system providing SPARQL-capable RDF stores alongside stores for arbitrary content
                                      Message 18 of 22 , Feb 4, 2011
                                        The Talis Platform (http://talis.com/platform) is a Software as a Service system providing SPARQL-capable RDF stores alongside stores for arbitrary content (HTML, blobs, whatever). It's particularly relevant here because access to the Platform is solely through HTTP (and I believe generally RESTful). I'm not up-to-date on developments so I sent a ping on Twitter, Sam (cc'd) responded:

                                        [[
                                        At the moment, the query throttling in the Platform works rather naively. Just like Alistair with openflydata, we've restricted some language features. Some of this is explicit, some SPARQL1.1 stuff we have enabled (like aggregates etc), others disabled (property paths). Also, we don't support any extension or property functions at the moment, so I guess that restricts quite a lot of *potential* functionality. We also do the other thing that Alistair talks about in the mail thread, we time how long each query is taking to execute and terminate those that run over a certain threshold currently 30 seconds. Obviously, this is at best crude and is something we'd like to improve. One major shortcoming currently is that a terminated query returns just an error response, no results. A patch has recently been submitted to ARQ to allow terminated queries to at least return the results they've already got, Paolo is working on getting that accepted now.

                                        I know its a bit apples & oranges, but do you think the SPARQL Uniform HTTP Protocol[3] is relevant to the thread on rest-discuss?
                                        ...
                                        [1] http://twitter.com/#!/danja/statuses/33094970680283136
                                        [2] https://issues.apache.org/jira/browse/JENA-29
                                        [3] http://www.w3.org/TR/sparql11-http-rdf-update/
                                        ]]

                                        --
                                        http://danny.ayers.name

                                      • Bob Ferris
                                        Hi, I recently got aware of Fuseki[1], a SPARQL server. The claim on the wiki is: It provides the REST-style SPARQL HTTP Update, and SPARQL Query and SPARQL
                                        Message 19 of 22 , Feb 4, 2011
                                          Hi,

                                          I recently got aware of Fuseki[1], a SPARQL server. The claim on the
                                          wiki is:

                                          "It provides the REST-style SPARQL HTTP Update, and SPARQL Query and
                                          SPARQL Update using the SPARQL protocol over HTTP."

                                          I think the "hypermedia as the engine of application state" constraint
                                          is not fulfilled, or?

                                          Cheers,


                                          Bob

                                          [1] http://openjena.org/wiki/Fuseki
                                        • Eric J. Bowman
                                          ... Well, that s one of em. In REST circles, you ll encounter the phrase HTTP != REST which means that REST is a subset of the things you can do with HTTP,
                                          Message 20 of 22 , Feb 4, 2011
                                            Bob Ferris wrote:
                                            >
                                            > I think the "hypermedia as the engine of application state"
                                            > constraint is not fulfilled, or?
                                            >

                                            Well, that's one of 'em. In REST circles, you'll encounter the phrase
                                            "HTTP != REST" which means that REST is a subset of the things you can
                                            do with HTTP, not the entire set of things you can do with HTTP. 99%
                                            of "REST APIs" out there are really HTTP APIs, all hail the power of
                                            the buzzword...

                                            -Eric
                                          • Nathan
                                            ... and weirdly HTTP is not a superset of REST, as in you can t do everything with HTTP that REST indicates you could/should (the mismatches) fair comment?
                                            Message 21 of 22 , Feb 4, 2011
                                              Eric J. Bowman wrote:
                                              > Bob Ferris wrote:
                                              >> I think the "hypermedia as the engine of application state"
                                              >> constraint is not fulfilled, or?
                                              >>
                                              >
                                              > Well, that's one of 'em. In REST circles, you'll encounter the phrase
                                              > "HTTP != REST" which means that REST is a subset of the things you can
                                              > do with HTTP, not the entire set of things you can do with HTTP. 99%
                                              > of "REST APIs" out there are really HTTP APIs, all hail the power of
                                              > the buzzword...

                                              and weirdly HTTP is not a superset of REST, as in you can't do
                                              everything with HTTP that REST indicates you could/should (the mismatches)

                                              fair comment?
                                            • Eric J. Bowman
                                              ... Absolutely. But, assuming a RESTful design, HTTP could be replaced in the future with Waka, HTTP 2, or whatever -- clearing up those mismatches inherent
                                              Message 22 of 22 , Feb 4, 2011
                                                Nathan wrote:
                                                >
                                                > and weirdly HTTP is not a superset of REST, as in you can't do
                                                > everything with HTTP that REST indicates you could/should (the
                                                > mismatches)
                                                >
                                                > fair comment?
                                                >

                                                Absolutely. But, assuming a RESTful design, HTTP could be replaced in
                                                the future with Waka, HTTP 2, or whatever -- clearing up those
                                                mismatches inherent in HTTP 1.1 without actually changing the API.

                                                -Eric
                                              Your message has been successfully submitted and would be delivered to recipients shortly.