Why would it be invalid? Misha From: email@example.com [mailto:firstname.lastname@example.org] On Behalf Of Michael Steidl (IPTC) Sent: 04 January 2012Jan 4, 2012 1 of 35View Source
Why would it be invalid?
· You say http://example.com/people? is the scheme URI
· … and the “raw” code string is “id=12345&group=223”
· There we are at the point: who has to encode the code part of a QCode? I understand from your posting that even this code part may include delimiters by the syntax of the protocol which must not be e! ncoded, but only the provider knows which characters have this! role.
· In this case the only possible rule is: the provider has to percent-encode both the scheme URI *and* the code, a QCode like this “fc:#3FTSE” from the Guidelines would be invalid.
I'm trying to follow up Philippe's considerations below and in earlier
postings - assuming that & is a reserved character (see my footnote on that
at the bottom of my posting):
* Philippe's initial issue was "that some URIs with reserved characters
aren't representable by QCodes", and an example was
* then Philippe wrote in a later email: " The problem is that when resolving
a QCode, the reserved characters in the code must be percent encoded (per
the current wording), which means that you will never get back, for example,
the "&" in the URI shown above (you'll get back %26 instead)."
Based on the Processing Model in 10.2.1.2 of the Specs document my view is:
Already at the time of defining a Scheme URI the creator of this URI has toconsider encoding. Reason: the Scheme URI as such may deliver a full ! CV -
therefore it must be a valid URI.
Let's assume this should be a Concept URI - as simple string (be aware of
the & in 2&23)
And the scheme URI should go from left to the = to the right of "id".
As the & in 2&23 is NOT a delimiter it must be encoded and the Scheme URI
http://example.com/people?group=2%2623&id= ... and as Alias let's define
If the code of the concept is 12345 then the QCode is pers1:12345
Resolving this QCode to a Concept URI should make
* then Philippe wrote in his email of 30/12 about what part of the G2
Processing Model is a problem: " In my view, it's in ste! p 2 of
This step says for processing a resolved QCode:
"Check if any Reserved Characters (see RFC 3986 section 2.2 ) are
included based on the used URI
scheme. In particular check for the reserved characters of the http scheme.
If such characters are
used apply percent-encoding as per RFC 3986."
I agree this directive is a bit confusing - at this point in the Processing
Model - as by the example above:
- the & in 2&23 is already percent-encoded
- the & before the "id" is a delimiter and therefore must not be
percent-encoded - but this cannot be definitely known by the receiver.
**Conclusion**: the Processing Model in the Spec document should be
rearranged in a way that the responsibility for percent-encoding has to be
taken by the creator of a Concept URI.
The only facet which should be discussed: is it required to percent-encode
also the code-part of a QCo! de.
Let's change the code based on the example above to 12&34, t! he QCode is
Resolving this QCode makes http://example.com/people?group=2%2623&id=12&34 -
but the & in 12&34 is not a delimiter, therefore must be encoded. Could this
action be moved to the receiver as a code is - by my understanding - always
a "component" in RFC 3986 terms and thus requires encoding of reserved
The reason for that is simple: there are vocabularies which exist for a
long time with codes which were defined without any knowledge about URI
syntax and the RFC 3986. Therefore a code like 12&34 could be widely known
and changing it to 12%2634 would cause a lot of confusion.
Therefore the responsibility for percent-encoding in the Processing Model
- with the creator of the Scheme URI for the Scheme URI
- with the receiver and processor of a QCode for the code part, he can
assume tha! t the scheme URI is already percent-encoded as required.
But there is still a tripwire: the creator of a CV MUST NOT percent-encode
codes - or MAY the creator percent-encode?
If the MUST NOT applies a code like 12%2634 must be percent-encoded to
12%252634 by the receiver.
But what action should be taken by the receiver if the MAY rule applies? The
pure string 12%2634 does not tell if the % comes from percent-encoding or
The final conclusion has to be applied also to section 13.8 of the
Implementation Guide as it currently tells contradictions, sorry:
That reserved characters in codes must be encoded by the provider, but also
claims that a QCode may be "fc:#3FTSE" as the # in the code will be encoded
by the receiver.
What do you think?
This "reserved purpose" is exactly causing me headaches as I was not able to
find an *explicit* definition that e.g. & has a! purpose which makes it a
reserved character for the http URI scheme! .
I've emphasized *explicit* as the http RFC 2616 doesn't even mention &, on
the other hand the URI RFC 3986 includes & into its - potential - reserved
characters  but says the state of being a reserved character is actually
defined by the specifications of the different URI schemes. But as just
said: the specification of the http scheme doesn't even mention the &
character. So do we have to build on practical experience combined with
feelings from the guts or written specifications?
> -----Original Message-----whatI
> From: email@example.com [mailto:newsml-
> firstname.lastname@example.org] On Behalf Of Philippe Mougin
> Sent: Wednesday, January 04, 2012 9:38 AM
> To: email@example.com
> Subject: [newsml-g2] Re: URI and qcodes - does some one has examples and
> documentation ?
> --- In firstname.lastname@example.org, misha.wolf@... wrote:
> > It seems to me that:
> > - an "&" has a reserved purpose in the query component of a URI
> > - when it is being used for that purpose it should *not* be percent-
> In the query component, & and %26 do not mean the same thing at all. An
> application that mint an URI will use either & or %26 depending on the
> meaning it wants to give to that URI.
> This is why percent encoding of reserved characters (where they have a
> reserved purpose) can only be done meaningfully by the application
> producing the URI.
> As far as! QCodes are concerned, percent encoding can't be done
> meaningfu! lly at QCode resolution time (which is, in my understanding,
> the G2 spec currently specify): it is too late. Instead, it should happenis when
> an application produce a QCode (i.e., write it down in a NewsL-G2
> document). This is when percent encoding the code can be done.
> Any member of this IPTC moderated Yahoo group must comply with the
> Intellectual Property Policy of the IPTC, available at
> http://www.iptc.org/goto/ipp. Any posting is assumed to be submitted
> under the conditions of this IPTC IP Policy.
> Yahoo! Groups Links
This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
... It s indeed a bit tricky! It works like this: a specific scheme does not have to explicitly list characters with reserved purpose for that scheme. ByJan 5, 2012 35 of 35View Source--- In email@example.com, "Michael Steidl \(IPTC\)" <mdirector@...> wrote:
>It's indeed a bit tricky! It works like this: a specific scheme does not have to explicitly list characters with reserved purpose for that scheme. By default, it inherits those from the generic syntax. It can, however, "override" the role of a given character in some its component: in that case it has to state this explicitly. As you remark, RFC 2616 does not mention &, so the generic system applies and therefore & in the query of an http URI must be percent encoded by the producing application if it is not used as a separator but as regular data octet.
> This "reserved purpose" is exactly causing me headaches as I was not able to
> find an *explicit* definition that e.g. & has a purpose which makes it a
> reserved character for the http URI scheme.
> I've emphasized *explicit* as the http RFC 2616 doesn't even mention &, on
> the other hand the URI RFC 3986 includes & into its - potential - reserved
> characters  but says the state of being a reserved character is actually
> defined by the specifications of the different URI schemes. But as just
> said: the specification of the http scheme doesn't even mention the &
> character. So do we have to build on practical experience combined with
> feelings from the guts or written specifications?
What I summarize here is a consequence of various ABNF rules in the generic syntax (RFC 3986), starting with the rule for the query component, and some prose in section 2.2. In particular: "each syntax rule lists the characters allowed within that component (i.e., not delimiting it), and any of those characters that are also in the reserved set are "reserved" for use as subcomponent delimiters within the component" and "URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component."