Loading ...
Sorry, an error occurred while loading the content.

RE: [newsml-g2] Re: URI and qcodes - does some one has examples and documentation ?

Expand Messages
  • misha.wolf@thomsonreuters.com
    Why would it be invalid? Misha From: newsml-g2@yahoogroups.com [mailto:newsml-g2@yahoogroups.com] On Behalf Of Michael Steidl (IPTC) Sent: 04 January 2012
    Message 1 of 35 , Jan 4, 2012
    View Source
    • 0 Attachment

      Why would it be invalid?

       

      Misha

       

      From: newsml-g2@yahoogroups.com [mailto:newsml-g2@yahoogroups.com] On Behalf Of Michael Steidl (IPTC)
      Sent: 04 January 2012 14:33
      To: newsml-g2@yahoogroups.com
      Subject: RE: [newsml-g2] Re: URI and qcodes - does some one has examples and documentation ?

       




      Kostas,

       

      ·         You say http://example.com/people? is the scheme URI

      ·         … and the “raw” code string is “id=12345&group=223”

      ·         There we are at the point: who has to encode the code part of a QCode? I understand from your posting that even this code part may include delimiters by the syntax of the protocol which must not be e! ncoded, but only the provider knows which characters have this! role.

      ·         In this case the only possible rule is: the provider has to percent-encode both the scheme URI *and* the code, a QCode like this “fc:#3FTSE” from the Guidelines would be invalid.

       

      Michael

       

      From: newsml-g2@yahoogroups.com [mailto:newsml-g2@yahoogroups.com] On Behalf Of Kostas Konstantoulis
      Sent: Wednesday, January 04, 2012 2:35 PM
      To: newsml-g2@yahoogroups.com
      Subject: RE: [newsml-g2] Re: URI and qcodes - does some one has examples and documentation ?

       



      Michael,

       

      I think Philippe says that different persons may belong to different groups so you cannot hardwire the group id in the scheme URI.

      So, the scheme URI should be http://example.com/people? (if this is a legal URI) and the qcode (e.g. for id=12345&group=223) something like “id=12345%26group=223”.

      Then, the ampersand encoding problem is obvious: the receiver does not know which URL to resolve to

      ! http://example.com/people?id=12345&group=223 or http://example.com/people?id=12345%26group=223

       

      Generally, the problem arises when the complete URL contains more that one parameters, so reserved characters fall within the qcode part of the URL.

       

      Kostas


      From: newsml-g2@yahoogroups.com [mailto:newsml-g2@yahoogroups.com] On Behalf Of Michael Steidl (IPTC)
      Sent: Wednesday, January 04, 2012 1:54 PM
      To: newsml-g2@yahoogroups.com
      Subject: RE: [newsml-g2] Re: URI and qcodes - does some one has examples and documentation ?

       

       

      I'm trying to follow up Philippe's considerations below and in earlier
      postings - assuming that & is a reserved character (see my footnote on that
      at the bottom of my posting):

      * Philippe's initial issue was "that some URIs with reserved characters
      aren't representable by QCodes", and an example was
      http://example.com/people?id=12345&group=223

      * then Philippe wrote in a later email: " The problem is that when resolving
      a QCode, the reserved characters in the code must be percent encoded (per
      the current wording), which means that you will never get back, for example,
      the "&" in the URI shown above (you'll get back %26 instead)."

      Based on the Processing Model in 10.2.1.2 of the Specs document my view is:
      Already at the time of defining a Scheme URI the creator of this URI has toconsider encoding. Reason: the Scheme URI as such may deliver a full ! CV -
      therefore it must be a valid URI.
      Let's assume this should be a Concept URI - as simple string (be aware of
      the & in 2&23)
      http://example.com/people?group=2&23&id=12345
      And the scheme URI should go from left to the = to the right of "id".
      As the & in 2&23 is NOT a delimiter it must be encoded and the Scheme URI
      must be:
      http://example.com/people?group=2%2623&id= ... and as Alias let's define
      pers1

      If the code of the concept is 12345 then the QCode is pers1:12345
      Resolving this QCode to a Concept URI should make
      http://example.com/people?group=2%2623&id=12345

      * then Philippe wrote in his email of 30/12 about what part of the G2
      Processing Model is a problem: " In my view, it's in ste! p 2 of
      10.2.1.2.2.1."

      This step says for processing a resolved QCode:
      "Check if any Reserved Characters (see RFC 3986 section 2.2 [1]) are
      included based on the used URI
      scheme. In particular check for the reserved characters of the http scheme.
      If such characters are
      used apply percent-encoding as per RFC 3986."

      I agree this directive is a bit confusing - at this point in the Processing
      Model - as by the example above:
      - the & in 2&23 is already percent-encoded
      - the & before the "id" is a delimiter and therefore must not be
      percent-encoded - but this cannot be definitely known by the receiver.

      **Conclusion**: the Processing Model in the Spec document should be
      rearranged in a way that the responsibility for percent-encoding has to be
      taken by the creator of a Concept URI.
      The only facet which should be discussed: is it required to percent-encode
      also the code-part of a QCo! de.
      Let's change the code based on the example above to 12&34, t! he QCode is
      pers1:12&34
      Resolving this QCode makes http://example.com/people?group=2%2623&id=12&34 -
      but the & in 12&34 is not a delimiter, therefore must be encoded. Could this
      action be moved to the receiver as a code is - by my understanding - always
      a "component" in RFC 3986 terms and thus requires encoding of reserved
      characters.
      The reason for that is simple: there are vocabularies which exist for a
      long time with codes which were defined without any knowledge about URI
      syntax and the RFC 3986. Therefore a code like 12&34 could be widely known
      and changing it to 12%2634 would cause a lot of confusion.

      Therefore the responsibility for percent-encoding in the Processing Model
      could be
      - with the creator of the Scheme URI for the Scheme URI
      - with the receiver and processor of a QCode for the code part, he can
      assume tha! t the scheme URI is already percent-encoded as required.

      But there is still a tripwire: the creator of a CV MUST NOT percent-encode
      codes - or MAY the creator percent-encode?
      If the MUST NOT applies a code like 12%2634 must be percent-encoded to
      12%252634 by the receiver.
      But what action should be taken by the receiver if the MAY rule applies? The
      pure string 12%2634 does not tell if the % comes from percent-encoding or
      not.

      The final conclusion has to be applied also to section 13.8 of the
      Implementation Guide as it currently tells contradictions, sorry:
      That reserved characters in codes must be encoded by the provider, but also
      claims that a QCode may be "fc:#3FTSE" as the # in the code will be encoded
      by the receiver.

      What do you think?

      Michael

      Footnote:
      This "reserved purpose" is exactly causing me headaches as I was not able to
      find an *explicit* definition that e.g. & has a! purpose which makes it a
      reserved character for the http URI scheme! .
      I've emphasized *explicit* as the http RFC 2616 doesn't even mention &, on
      the other hand the URI RFC 3986 includes & into its - potential - reserved
      characters [1] but says the state of being a reserved character is actually
      defined by the specifications of the different URI schemes. But as just
      said: the specification of the http scheme doesn't even mention the &
      character. So do we have to build on practical experience combined with
      feelings from the guts or written specifications?

      [1] http://tools.ietf.org/html/rfc3986#section-2.2

      > -----Original Message-----
      > From: newsml-g2@yahoogroups.com [mailto:newsml-
      > g2@yahoogroups.com] On Behalf Of Philippe Mougin
      > Sent: Wednesday, January 04, 2012 9:38 AM
      > To: newsml-g2@yahoogroups.com
      > Subject: [newsml-g2] Re: URI and qcodes - does some one has examples and
      > documentation ?
      >
      > --- In newsml-g2@yahoogroups.com, misha.wolf@... wrote:
      > >
      > > It seems to me that:
      > > - an "&" has a reserved purpose in the query component of a URI
      > > - when it is being used for that purpose it should *not* be percent-
      > encoded
      >
      > Exactly!
      > In the query component, & and %26 do not mean the same thing at all. An
      > application that mint an URI will use either & or %26 depending on the
      > meaning it wants to give to that URI.
      >
      > This is why percent encoding of reserved characters (where they have a
      > reserved purpose) can only be done meaningfully by the application
      > producing the URI.
      >
      > As far as! QCodes are concerned, percent encoding can't be done
      > meaningfu! lly at QCode resolution time (which is, in my understanding,
      whatI
      > the G2 spec currently specify): it is too late. Instead, it should happen
      is when
      > an application produce a QCode (i.e., write it down in a NewsL-G2
      > document). This is when percent encoding the code can be done.
      >
      > Philippe
      >
      >
      >
      >
      > ------------------------------------
      >
      > Any member of this IPTC moderated Yahoo group must comply with the
      > Intellectual Property Policy of the IPTC, available at
      > http://www.iptc.org/goto/ipp. Any posting is assumed to be submitted
      > under the conditions of this IPTC IP Policy.
      > Yahoo! Groups Links
      >
      >
      >



       

      !





      This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
    • Philippe Mougin
      ... It s indeed a bit tricky! It works like this: a specific scheme does not have to explicitly list characters with reserved purpose for that scheme. By
      Message 35 of 35 , Jan 5, 2012
      View Source
      • 0 Attachment
        --- In newsml-g2@yahoogroups.com, "Michael Steidl \(IPTC\)" <mdirector@...> wrote:
        >
        > This "reserved purpose" is exactly causing me headaches as I was not able to
        > find an *explicit* definition that e.g. & has a purpose which makes it a
        > reserved character for the http URI scheme.
        > I've emphasized *explicit* as the http RFC 2616 doesn't even mention &, on
        > the other hand the URI RFC 3986 includes & into its - potential - reserved
        > characters [1] but says the state of being a reserved character is actually
        > defined by the specifications of the different URI schemes. But as just
        > said: the specification of the http scheme doesn't even mention the &
        > character. So do we have to build on practical experience combined with
        > feelings from the guts or written specifications?

        It's indeed a bit tricky! It works like this: a specific scheme does not have to explicitly list characters with reserved purpose for that scheme. By default, it inherits those from the generic syntax. It can, however, "override" the role of a given character in some its component: in that case it has to state this explicitly. As you remark, RFC 2616 does not mention &, so the generic system applies and therefore & in the query of an http URI must be percent encoded by the producing application if it is not used as a separator but as regular data octet.

        What I summarize here is a consequence of various ABNF rules in the generic syntax (RFC 3986), starting with the rule for the query component, and some prose in section 2.2. In particular: "each syntax rule lists the characters allowed within that component (i.e., not delimiting it), and any of those characters that are also in the reserved set are "reserved" for use as subcomponent delimiters within the component" and "URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component."

        Philippe
      Your message has been successfully submitted and would be delivered to recipients shortly.