Loading ...
Sorry, an error occurred while loading the content.

Format & interpretation of URL fragments for JSON resources

Expand Messages
  • Jacob Davies
    I have a question regarding the use of URL fragments (the part after the # (hash) character in a standard URL) for navigating JSON resources. So far as I can
    Message 1 of 5 , Feb 26 4:34 PM
    • 0 Attachment
      I have a question regarding the use of URL fragments (the part after
      the # (hash) character in a standard URL) for navigating JSON
      resources. So far as I can see from some searches & investigation,
      there does not seem to be a firm consensus on the format and
      interpretation of them, and there is a fairly major problem with the
      most common suggestion I've seen, which is the interpretation of the
      fragment as a series of dot-delimited, URL-encoded keys to be used to
      navigate through a set of nested JSON objects and arrays.

      So, an example. The fragment:

      #foo.bar.0

      when used to navigate the JSON resource:

      {
          "foo" : {
              "bar" : [
                  "xyz"
              ]
          }
      }

      would refer to the value "xyz".

      This has the attractive feature of looking like the Javascript or Java
      dot-notation for navigating objects.

      The problem is that dot/period is explicitly included in the list of
      non-reserved characters in URL-encoding:

      http://tools.ietf.org/html/rfc3986#page-13

      "For consistency, percent-encoded octets [...] period (%2E) [...]
      should not be created by URI producers"

      So the simple statement of the format ("dot-delimited, URL-encoded
      keys") is either ambiguous or cannot accommodate keys containing
      periods.

      A simple example to illustrate:

      {
      "foo" : {
      "bar" : "xyz"
      },
      "foo.bar" : "abc"
      }

      Does the fragment #foo.bar refer to the value "xyz" or "abc".

      Obviously it is straightforward to replace the periods in keys with %2E
      and therefore distinguish between these fragments:

      #foo.bar - intended to refer to "xyz"
      #foo%2Ebar - intended to refer to "abc"

      But, there are some problems with this procedure, two minor, one major.

      The first minor problem is that standard URL-encoding routines do not
      replace dots with the %2E escape. The second minor problem is that it
      makes it awkward to construct fragments by hand that refer to keys that
      contain dots.

      The major problem is that this method of interpretation of a URL is
      explicitly disallowed. Quoting again from RFC 3986:

      "URIs that differ in the replacement of an unreserved character with
      its corresponding percent-encoded US-ASCII octet are equivalent: they
      identify the same resource."

      Clearly this is not true in the above example. Replacement of %2E with
      a period changes the interpretation of the fragment. Note that the
      word "unreserved" is significant in the above quote - the
      replacement of a reserved character by its URL-encoded counterpart IS
      allowed to make a difference in distinguishing between resources.

      So, I have a suggestion for an alternative format and interpretation,
      which is:

      "URL fragments contain a slash-delimited, URL-encoded list of keys
      used to navigate a JSON structure from the root".

      So, given the JSON resource:

      {
      "foo" : {
      "bar" : "xyz"
      },
      "foo.bar" : "abc",
      "foo/bar" : "123"
      }

      the contained values can be unambiguously referred to using the
      fragments:

      #foo/bar - "xyz"
      #foo.bar - "abc"
      #foo%2Fbar - "123"

      Slash IS a reserved character for URL-encoding, which means,
      firstly, that we can legitimately distinguish between the first and
      last examples there as referring to different resources; secondly,
      that standard URL-encoding routines will correctly escape it, and
      the wording of the format is unambiguous; and thirdly, that keys
      containing dots can be easily used in URLs - in my experience such
      keys are far more common than keys containing slashes, and there
      have been several recent suggestions for using reversed domain names
      in dotted keys as an ad-hoc namespace mechanism in JSON similar to the
      use for Java package names, for instance:

      {
      "org.itemscript.Name" : "Jacob"
      }

      One final note: the use of an initial slash to indicate that the value
      is rooted at the top level of the JSON structure seems unnecessary,
      since fragment identifiers by definition are global to a given resource
      or document.

      Anyway, just some thoughts. I know that the dot-delimited fragment
      format already has some momentum, but I had to make a decision about
      which format to use for something I was working on recently, and after
      thinking about it (and using the dot-delimited format for a while) I
      found that the problems with dot-delimited were significant enough that
      I didn't use it. I do think a consistent interpretation of URL fragments
      in JSON resources would be quite useful though.

      --
      Jacob Davies
      jacob@...
    • Kris Zyp
      [+restful-json] Jacob, You may already be aware of this, but a specification for the dot-delimited hash/fragment resolution mechanism is in the JSON Schema I-D
      Message 2 of 5 , Feb 26 8:06 PM
      • 0 Attachment
        [+restful-json]
        Jacob,
        You may already be aware of this, but a specification for the
        dot-delimited hash/fragment resolution mechanism is in the JSON Schema
        I-D (6.2.1) [1]. One thing to be noted that you can specify alternate
        hash/fragment resolution mechanisms in the schema, the draft just
        defines dot-delimited as the default. However, we do certainly want the
        default to be legitimate. I'd be glad to change the draft to slashes if
        there is consensus that using slashes is more appropriate. However,
        based on prior conversations [2], I had thought that there was agreement
        that the stipulations of RFC 3986 didn't need to be strictly applied to
        hashes, since they aren't transferred over the wire and don't identify
        resources (they identify internal parts of a resource, and the text you
        quoted from RFC 3986 refers to how resources are identified). I am
        certainly open to the idea that slashes might be better though, but
        since dots are currently in use, I would only want to alter the JSON
        schema draft if there is sufficient reason.

        [1] http://tools.ietf.org/html/draft-zyp-json-schema-01#section-6.2.1
        [2]
        http://groups.google.com/group/restful-json/browse_thread/thread/e3fd36625bb71d01

        Thanks,
        Kris

        On 2/26/2010 5:34 PM, Jacob Davies wrote:
        >
        >
        > I have a question regarding the use of URL fragments (the part after
        > the # (hash) character in a standard URL) for navigating JSON
        > resources. So far as I can see from some searches & investigation,
        > there does not seem to be a firm consensus on the format and
        > interpretation of them, and there is a fairly major problem with the
        > most common suggestion I've seen, which is the interpretation of the
        > fragment as a series of dot-delimited, URL-encoded keys to be used to
        > navigate through a set of nested JSON objects and arrays.
        >
        > So, an example. The fragment:
        >
        > #foo.bar.0
        >
        > when used to navigate the JSON resource:
        >
        > {
        > "foo" : {
        > "bar" : [
        > "xyz"
        > ]
        > }
        > }
        >
        > would refer to the value "xyz".
        >
        > This has the attractive feature of looking like the Javascript or Java
        > dot-notation for navigating objects.
        >
        > The problem is that dot/period is explicitly included in the list of
        > non-reserved characters in URL-encoding:
        >
        > http://tools.ietf.org/html/rfc3986#page-13
        > <http://tools.ietf.org/html/rfc3986#page-13>
        >
        > "For consistency, percent-encoded octets [...] period (%2E) [...]
        > should not be created by URI producers"
        >
        > So the simple statement of the format ("dot-delimited, URL-encoded
        > keys") is either ambiguous or cannot accommodate keys containing
        > periods.
        >
        > A simple example to illustrate:
        >
        > {
        > "foo" : {
        > "bar" : "xyz"
        > },
        > "foo.bar" : "abc"
        > }
        >
        > Does the fragment #foo.bar refer to the value "xyz" or "abc".
        >
        > Obviously it is straightforward to replace the periods in keys with %2E
        > and therefore distinguish between these fragments:
        >
        > #foo.bar - intended to refer to "xyz"
        > #foo%2Ebar - intended to refer to "abc"
        >
        > But, there are some problems with this procedure, two minor, one major.
        >
        > The first minor problem is that standard URL-encoding routines do not
        > replace dots with the %2E escape. The second minor problem is that it
        > makes it awkward to construct fragments by hand that refer to keys that
        > contain dots.
        >
        > The major problem is that this method of interpretation of a URL is
        > explicitly disallowed. Quoting again from RFC 3986:
        >
        > "URIs that differ in the replacement of an unreserved character with
        > its corresponding percent-encoded US-ASCII octet are equivalent: they
        > identify the same resource."
        >
        > Clearly this is not true in the above example. Replacement of %2E with
        > a period changes the interpretation of the fragment. Note that the
        > word "unreserved" is significant in the above quote - the
        > replacement of a reserved character by its URL-encoded counterpart IS
        > allowed to make a difference in distinguishing between resources.
        >
        > So, I have a suggestion for an alternative format and interpretation,
        > which is:
        >
        > "URL fragments contain a slash-delimited, URL-encoded list of keys
        > used to navigate a JSON structure from the root".
        >
        > So, given the JSON resource:
        >
        > {
        > "foo" : {
        > "bar" : "xyz"
        > },
        > "foo.bar" : "abc",
        > "foo/bar" : "123"
        > }
        >
        > the contained values can be unambiguously referred to using the
        > fragments:
        >
        > #foo/bar - "xyz"
        > #foo.bar - "abc"
        > #foo%2Fbar - "123"
        >
        > Slash IS a reserved character for URL-encoding, which means,
        > firstly, that we can legitimately distinguish between the first and
        > last examples there as referring to different resources; secondly,
        > that standard URL-encoding routines will correctly escape it, and
        > the wording of the format is unambiguous; and thirdly, that keys
        > containing dots can be easily used in URLs - in my experience such
        > keys are far more common than keys containing slashes, and there
        > have been several recent suggestions for using reversed domain names
        > in dotted keys as an ad-hoc namespace mechanism in JSON similar to the
        > use for Java package names, for instance:
        >
        > {
        > "org.itemscript.Name" : "Jacob"
        > }
        >
        > One final note: the use of an initial slash to indicate that the value
        > is rooted at the top level of the JSON structure seems unnecessary,
        > since fragment identifiers by definition are global to a given resource
        > or document.
        >
        > Anyway, just some thoughts. I know that the dot-delimited fragment
        > format already has some momentum, but I had to make a decision about
        > which format to use for something I was working on recently, and after
        > thinking about it (and using the dot-delimited format for a while) I
        > found that the problems with dot-delimited were significant enough that
        > I didn't use it. I do think a consistent interpretation of URL fragments
        > in JSON resources would be quite useful though.
        >
        > --
        > Jacob Davies
        > jacob@... <mailto:jacob%40well.com>
        >
        >

        --
        Thanks,
        Kris



        [Non-text portions of this message have been removed]
      • Fredag_d13
        Hi Jacob I have implemented a way to navigate in a JSON object in my PLSQL JSON implementation. It s basically a subset of how you navigate within Javascript
        Message 3 of 5 , Feb 27 2:24 AM
        • 0 Attachment
          Hi Jacob

          I have implemented a way to navigate in a JSON object in my PLSQL JSON implementation. It's basically a subset of how you navigate within Javascript objects.

          The idea is that you can use the dot notation or a squarebracket notation to extract content from your JSON structure.

          You only use the dot notation to extract members, and that is really only a shorthand for extracting the squarebracket way.
          The path for your first example would be:

          foo.bar[0] or ["foo"].bar[0] or foo["bar"][0] or ["foo"]["bar"][0]

          To extract "abc" in your second example use the path ["foo.bar"] and
          to extract "xyz" use the path foo.bar or foo["bar"]

          There's no way in my implementation to reach "abc" without the squarebracket notation.

          Hope that could be useful in your URL fragments.

          --- In json@yahoogroups.com, Jacob Davies <jacob@...> wrote:
          >
          > I have a question regarding the use of URL fragments (the part after
          > the # (hash) character in a standard URL) for navigating JSON
          > resources. So far as I can see from some searches & investigation,
          > there does not seem to be a firm consensus on the format and
          > interpretation of them, and there is a fairly major problem with the
          > most common suggestion I've seen, which is the interpretation of the
          > fragment as a series of dot-delimited, URL-encoded keys to be used to
          > navigate through a set of nested JSON objects and arrays.
          >
          > So, an example. The fragment:
          >
          > #foo.bar.0
          >
          > when used to navigate the JSON resource:
          >
          > {
          >     "foo" : {
          >         "bar" : [
          >             "xyz"
          >         ]
          >     }
          > }
          >
          > would refer to the value "xyz".
          >
          > This has the attractive feature of looking like the Javascript or Java
          > dot-notation for navigating objects.
          >
          > The problem is that dot/period is explicitly included in the list of
          > non-reserved characters in URL-encoding:
          >
          > http://tools.ietf.org/html/rfc3986#page-13
          >
          > "For consistency, percent-encoded octets [...] period (%2E) [...]
          > should not be created by URI producers"
          >
          > So the simple statement of the format ("dot-delimited, URL-encoded
          > keys") is either ambiguous or cannot accommodate keys containing
          > periods.
          >
          > A simple example to illustrate:
          >
          > {
          > "foo" : {
          > "bar" : "xyz"
          > },
          > "foo.bar" : "abc"
          > }
          >
          > Does the fragment #foo.bar refer to the value "xyz" or "abc".
          >
          > Obviously it is straightforward to replace the periods in keys with %2E
          > and therefore distinguish between these fragments:
          >
          > #foo.bar - intended to refer to "xyz"
          > #foo%2Ebar - intended to refer to "abc"
          >
          > But, there are some problems with this procedure, two minor, one major.
          >
          > The first minor problem is that standard URL-encoding routines do not
          > replace dots with the %2E escape. The second minor problem is that it
          > makes it awkward to construct fragments by hand that refer to keys that
          > contain dots.
          >
          > The major problem is that this method of interpretation of a URL is
          > explicitly disallowed. Quoting again from RFC 3986:
          >
          > "URIs that differ in the replacement of an unreserved character with
          > its corresponding percent-encoded US-ASCII octet are equivalent: they
          > identify the same resource."
          >
          > Clearly this is not true in the above example. Replacement of %2E with
          > a period changes the interpretation of the fragment. Note that the
          > word "unreserved" is significant in the above quote - the
          > replacement of a reserved character by its URL-encoded counterpart IS
          > allowed to make a difference in distinguishing between resources.
          >
          > So, I have a suggestion for an alternative format and interpretation,
          > which is:
          >
          > "URL fragments contain a slash-delimited, URL-encoded list of keys
          > used to navigate a JSON structure from the root".
          >
          > So, given the JSON resource:
          >
          > {
          > "foo" : {
          > "bar" : "xyz"
          > },
          > "foo.bar" : "abc",
          > "foo/bar" : "123"
          > }
          >
          > the contained values can be unambiguously referred to using the
          > fragments:
          >
          > #foo/bar - "xyz"
          > #foo.bar - "abc"
          > #foo%2Fbar - "123"
          >
          > Slash IS a reserved character for URL-encoding, which means,
          > firstly, that we can legitimately distinguish between the first and
          > last examples there as referring to different resources; secondly,
          > that standard URL-encoding routines will correctly escape it, and
          > the wording of the format is unambiguous; and thirdly, that keys
          > containing dots can be easily used in URLs - in my experience such
          > keys are far more common than keys containing slashes, and there
          > have been several recent suggestions for using reversed domain names
          > in dotted keys as an ad-hoc namespace mechanism in JSON similar to the
          > use for Java package names, for instance:
          >
          > {
          > "org.itemscript.Name" : "Jacob"
          > }
          >
          > One final note: the use of an initial slash to indicate that the value
          > is rooted at the top level of the JSON structure seems unnecessary,
          > since fragment identifiers by definition are global to a given resource
          > or document.
          >
          > Anyway, just some thoughts. I know that the dot-delimited fragment
          > format already has some momentum, but I had to make a decision about
          > which format to use for something I was working on recently, and after
          > thinking about it (and using the dot-delimited format for a while) I
          > found that the problems with dot-delimited were significant enough that
          > I didn't use it. I do think a consistent interpretation of URL fragments
          > in JSON resources would be quite useful though.
          >
          > --
          > Jacob Davies
          > jacob@...
          >
        • Jacob Davies
          Thanks for the pointer to the discussion, that s what I was looking for. I guess I should also join the restful-json list. I don t think this is the end of the
          Message 4 of 5 , Feb 27 9:32 AM
          • 0 Attachment
            Thanks for the pointer to the discussion, that's what I was looking
            for. I guess I should also join the restful-json list.

            I don't think this is the end of the world one way or another,
            obviously! I think interoperability and consensus are more important
            than absolutely strict compliance. I guess my question was, "Is there
            a consensus?", and if there isn't a strong consensus, to raise a
            couple of points about the dot-delimited format and see what people
            thought.

            On resource identification, I think that's a reasonable reading of
            that section, but I'm not sure it was intended to mean that fragment
            identifiers should not follow the same rule. For instance, this
            paragraph:

            "For consistency, percent-encoded octets in the ranges of ALPHA
            (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
            underscore (%5F), or tilde (%7E) should not be created by URI
            producers."

            isn't talking about resources versus fragments, but just about URIs in general.

            The other problem, which is minor but makes for a more complex
            explanation of the format, is that the description of keys as
            URI-encoded isn't quite right. URI-encoding does not involve
            replacement of dots in the original string with %2E. So you have to
            say something like "URI-encoding with the additional replacement of
            dots with %2E".

            Both are pretty nitpicky points, I admit, but then interpreting URLs
            is always a pretty nitpicky process... I did have the JSON Schema
            proposal in mind, but didn't want to seem like I was complaining about
            one particular description - after all, the dot-delimited format was
            also how I approached this in the first place, and since I am not
            using JSON Schema myself I'm not sure it's my place to suggest
            changes. I fully appreciate the difficulties in changing a format
            that's already in use!

            One thought I just had is that you can reliably distinguish between
            the two formats if the slash-delimited format is change to REQUIRE a
            leading slash in the slash-delimited format. That is, because you
            require that the fragment be a list of dot-delimited
            URI-encoded(-with-dots-encoded) keys, you have already forbidden the
            use of a slash in the fragment, since a slash is a reserved character
            that must be URI-encoded. So the two don't necessarily need to
            conflict. I'm not necessarily saying that an implementation would have
            to understand both (although it could), but it could at least
            distinguish between the two and give an error if it encounters the
            format it doesn't understand. I'll have to think about that... I said
            before that a leading slash was unnecessary because fragments refer to
            navigation from the top of a resource, but it's at least analogous to
            the leading slash in a filesystem path or in the path section of a
            URL.

            There is some potential for allowing a number of different,
            non-conflicting fragment resolution mechanisms indicated by the use of
            a reserve character as the first character in the fragment that way,
            which could be useful for other navigation mechanisms, perhaps
            JSONPath for instance. Again, those wouldn't conflict with the strict
            interpretation of the dot-delimited format.



            On Fri, Feb 26, 2010 at 8:06 PM, Kris Zyp <kriszyp@...> wrote:
            >
            > [+restful-json]
            > Jacob,
            > You may already be aware of this, but a specification for the dot-delimited hash/fragment resolution mechanism is in the JSON Schema I-D (6.2.1) [1]. One thing to be noted that you can specify alternate hash/fragment resolution mechanisms in the schema, the draft just defines dot-delimited as the default. However, we do certainly want the default to be legitimate. I'd be glad to change the draft to slashes if there is consensus that using slashes is more appropriate. However, based on prior conversations [2], I had thought that there was agreement that the stipulations of RFC 3986 didn't need to be strictly applied to hashes, since they aren't transferred over the wire and don't identify resources (they identify internal parts of a resource, and the text you quoted from RFC 3986 refers to how resources are identified). I am certainly open to the idea that slashes might be better though, but since dots are currently in use, I would only want to alter the JSON schema draft if there is sufficient reason.
            >
            > [1] http://tools.ietf.org/html/draft-zyp-json-schema-01#section-6.2.1
            > [2] http://groups.google.com/group/restful-json/browse_thread/thread/e3fd36625bb71d01
            >
            > Thanks,
            > Kris
            >
            > --
            > Thanks,
            > Kris


            --
            jacob@...
          • Jacob Davies
            One more note - I was looking at the section on normalization again: http://tools.ietf.org/html/rfc3986#section-6.2.2 and this paragraph stood out: The
            Message 5 of 5 , Feb 27 4:23 PM
            • 0 Attachment
              One more note - I was looking at the section on normalization again:

              http://tools.ietf.org/html/rfc3986#section-6.2.2

              and this paragraph stood out:

              "The percent-encoding mechanism (Section 2.1) is a frequent source of
              variance among otherwise identical URIs. In addition to the case
              normalization issue noted above, some URI producers percent-encode
              octets that do not require percent-encoding, resulting in URIs that
              are equivalent to their non-encoded counterparts. These URIs should
              be normalized by decoding any percent-encoded octet that corresponds
              to an unreserved character, as described in Section 2.3."

              Again, this section isn't talking about resources versus
              navigation-inside-resources, it's just talking about URIs as a whole
              (of which the fragment is a part) and the process for normalizing
              them. That process decodes "any percent-encoded octet that corresponds
              to an unreserved character", leaving the recipient of the URI
              completely unable to distinguish between "#abc.def" and "#abc%2Edef".
              Obviously for an application that is performing purely internal
              processing with the fragments that may not be a problem, but for any
              other use, a URI processor perfectly compliant with RFC3986 can break
              this fragment format by removing the ability to refer to keys with
              embedded dots.

              On the same note there is this from section 2.4:

              "[T]he components and subcomponents
              significant to the scheme-specific dereferencing process (if any)
              must be parsed and separated before the percent-encoded octets within
              those components can be safely decoded, as otherwise the data may be
              mistaken for component delimiters. The only exception is for
              percent-encoded octets corresponding to characters in the unreserved
              set, which *can be decoded at any time*."

              (My emphasis.)

              So the problem is, again, a perfectly compliant URI parser can (and
              the following text may imply that it should) replace all
              percent-encode sequences corresponding to unreserved characters before
              parsing it into sections.

              And as a matter of actual practice, Firefox - although not IE or
              Chrome - *does* normalize fragments in exactly this way. Type in
              http://www.yahoo.com/#%2E and it will be replaced with
              http://www.yahoo.com/#. after you hit enter. This would be an active
              problem for me if I used the dot-delimited format (although I hadn't
              run into it before I switched to slashes), since I use the URI
              fragment to indicate state in a browser application, including (under
              certain circumstances) acting as a pointer into a separate JSON
              structure. Use of the fragment to carry application state (for the
              purposes of browser history and the back button) is a pretty common
              technique in AJAX applications.

              --
              jacob@...
            Your message has been successfully submitted and would be delivered to recipients shortly.