Loading ...
Sorry, an error occurred while loading the content.

15720Re: [rest-discuss] Determining which Media type for post/put

Expand Messages
  • Eric J. Bowman
    Jun 17 1:16 PM
      Peter Williams wrote:
      >
      > > Using Content-Location, we can associate one application/xhtml+xml
      > > variant with multiple combinations of selection headers, i.e. a
      > > one-to-many mapping.  This can't be done without some means of
      > > distinguishing one variant from another, without sniffing content.
      >
      > Providing a `content-location` allows more efficient caching by
      > allowing mapping a variety of selection headers to a single entity in
      > caches. Agreed. On the other hand, vigorous use of `etag` would
      > provide similar improvements to the cache hit rate. It is a big step
      > from "Content-Location can improve cache hit rates" to, "conneg is
      > useless without Content-Location".
      >

      My position is that assigning URIs to variants is both a REST constraint
      and HTTP best-practice. I haven't said "conneg is useless without
      Content-Location," particularly as I've kept saying "except for
      caching"... I get your meaning, though, but "Content-Location can
      improve cache hit rates" is your strawman, not my position.

      Over the course of the thread, I may have staked out too rigid a
      position, that the only way to distinguish variants from one another is
      by assigning Content-Location URIs to them. You are correct, Etag may
      be used to distinguish variants, and this can increase cache hit rates
      even when Content-Location is absent.

      But, this does not follow REST, so it does not change my advice...

      >
      > A conforming cache will not respond with an inappropriate
      > representation if the server sends an appropriate `vary` header.
      >

      OK. I was giving one example of aberrant cache behavior, which doesn't
      apply to the specifics of using Etag in combination with Vary. My way
      of doing things is to make my system compliant with HTTP 1.0 caches to
      the fullest extent possible, because last I heard there were still
      plenty of HTTP 1.0 caches deployed out there on the real-world Web.

      So to my way of thinking, conneg should work independently of caching
      scheme, i.e. Etag or Expires both work when Vary is combined with
      Content-Location... which is probably another reason for that SHOULD.

      >
      > (Though it might miss a valid chance to serve a cached entity.)
      >

      The other drawback to relying on Etag to cover for a missing Content-
      Location, is that on the real-world, anarchically-scalable Web, myriad
      cases exist where a cache may legitimately decide to serve a stale
      representation. This loss of control is the tradeoff to caching. By
      omitting Content-Location, you're preventing the cache from identifying
      the proper variant to send, forcing it to contact the origin server,
      which presumably it had good reason to avoid doing (like if that server
      is unavailable from the cache's location). When Content-Location is
      omitted, much uncertainty is introduced which is otherwise avoided by
      following the SHOULD.

      >
      > Private caches at the user agent are less susceptible to selection
      > criteria explosion. Repeated requests from a single user agent are
      > likely to all be quite similar. In my experience private caches are
      > far more important than caching intermediates, anyway.
      >

      My experience disagrees with your experience. When I first started
      doing Web development in late 1993, it was by downloading Mosaic via my
      Compuserve account, and creating pages on my local filesystem. My
      first experience with HTTP was in 1994, after I'd opened my own ISP. I
      was an early member of the Colorado Internet Cooperative Association,
      whose board consisted of most of the authors of "UNIX System
      Administration Handbook".

      One of whom was Evi (who had a second home in Steamboat Springs, but
      went with my non-coop competition because I only offered PPP and she
      demanded CSLIP), who, in her position as a professor at CU-Boulder, was
      instrumental in the student-led development of squid. The first anyone
      really ever heard of squid was at a coop meeting, to an ISP-dominated
      audience. So in my (heavily-ISP-weighted) experience, shared caches
      are far more important than private.

      But, this is just one preference vs. another. I do not take the view
      that REST constraints which don't apply to a particular system, are
      irrelevant. Thus, constraints intended to increase visibility to
      intermediary components are still part of the style, even when we only
      care about private caches which don't require us to follow such
      constraints.

      You are presenting an edge case of not caring about shared caches,
      showing that Content-Location isn't required. I cannot be persuaded
      that any edge case nullifies the best-practice advice I'm giving. I
      only agree that your edge case exists, not that you're better off by
      not meeting the identification of resources constraint.

      REST is the Platonic Ideal for the long-term development of a system --
      just because you're setting Cache-Control: private today, doesn't mean
      you shouldn't be able to change it tomorrow, by just changing the Cache-
      Control header. If your system wasn't designed with a long-term view
      of REST, then you can't just change Cache-Control, you must also add
      Content-Location.

      So what I'm saying is, start with Content-Location even if you don't
      see an immediate need for it. By making it your habit to follow this
      best practice, you'll never regret having avoided it. Instead of
      tailoring my solutions to the specific needs of the system I'm
      developing, I follow REST and develop a Uniform Interface, because I
      know that works in the present and will continue to work in the future,
      so I won't have to re-architect any system in response to its evolving
      needs. Tweaking an existing system's headers is easier than adding new
      headers.

      >
      > `content-location` is a terribly useful header. Using it does
      > increase the cache hit rates for negotiated resources. However,
      > skipping `content-location` in a negotiated response does not violate
      > any of the REST constraints that i can see.
      >

      Variants are resources. As such, REST requires them to be identified,
      in order for one variant to be distinguishable from another. Etag does
      not meet this constraint, because Etags are transient, in that they
      change over time for any given representation. The purpose of
      assigning a URI is to declare a static mapping. This is why assigning
      URIs to variants is a best practice -- provide one URI for a set of
      Etagged entities to map to.

      In HTTP, REST's requirement of assigning URIs to variants is reflected
      in the SHOULD about Content-Location. So to apply REST in HTTP, the
      SHOULD is followed. You are pointing to an edge case, where avoiding
      Content-Location can still be made to work. But you haven't explained
      why minting those URIs is undesirable, i.e. "works without it" does not
      justify avoiding Content-Location. "Compression" justifies avoiding
      Content-Location, i.e. ignoring the SHOULD, but I still haven't seen
      any other case where that SHOULD shouldn't be taken as a MUST (if, that
      is, you're following REST and applying the identification of resources
      constraint).

      I still wouldn't want to touch a non-compression conneg system that
      avoids Content-Location with a ten-foot pole. There is no simpler way
      to develop and maintain a conneg system, than to assign URIs to
      variants (except for compression), even if those URIs aren't exposed
      beyond the firewall. I've developed enough conneg systems to know that
      at some point, most likely more than one point, I will need to examine
      variants directly, bypassing the negotiation mechanism entirely (as
      opposed to testing the mechanism by altering selection headers).

      To me, this is a stronger argument than any edge case where Content-
      Location isn't technically needed by a caching scheme -- I don't care,
      assign URIs to your variants anyway, because REST requires it, and
      because it would be insane to develop and maintain a conneg system
      without doing so (except for compression). Spoken from experience.

      There is still no downside to assigning URIs to variants, so I still
      don't see the point in examining edge cases. Why *not* assign URIs to
      variants? What is it we're so desperately trying to avoid here, that we
      would disregard best practice by ignoring RFC 2616's SHOULD? Not
      caring about shared caching isn't a reason, particularly given that
      this is rest-discuss, where our concern is targeting the sweet-spot in
      the deployed Web which allows anarchic scalability (shared caching).

      The identification of resources constraint, applied in HTTP by using
      Content-Location to assign URIs to variants, allows for anarchic
      scalability. Edge cases where that level of scalability aren't
      required, are not sufficient reason not to apply the constraint anyway,
      and don't change best practice. Best practice in REST is to apply REST
      constraints and follow HTTP. Assigning URIs to variants is required by
      REST and strongly recommended as best practice by HTTP. Even if
      avoiding this has no downside today, REST development means not assuming
      that tomorrow's needs are the same as today's; design for the future.

      So the only advice I can give about assigning URIs to variants, is to
      do just exactly that. There is no REST argument *against* doing so,
      and a key REST constraint will be met by following this best practice.
      This really is as simple as the black-and-white clarity of the advice I
      keep giving. Even if one doesn't uderstand it, I promise you that it's
      far easier to learn REST by implementing best practices and learning
      from them, than trying to learn REST by avoiding best practices in one's
      implementations, then trying to rectify the results with REST ex-post-
      facto.

      REST should be any Web system's long-term goal. I don't fault a system
      for not implementing a constraint, if applying the constraint carries
      an immediate cost which outweighs the constraint's long-term benefits.
      This is not such a case. Identification of resources is fundamental,
      and has no costs to implement. I would even say that to avoid
      assigning URIs to variants, carries greater immediate costs (in terms
      of development hours alone) than are incurred by assigning them. So I
      still don't see any theoretical or cost-benefit reasons to avoid
      assigning URIs to variants.

      -Eric
    • Show all 82 messages in this topic