Loading ...
Sorry, an error occurred while loading the content.

Re: URI and qcodes - does some one has examples and documentation ?

Expand Messages
  • Philippe Mougin
    In my view, it s in step 2 of 10.2.1.2.2.1. When some system receive a G2 document and resolve a QCode from this document using this processing model (which I
    Message 1 of 35 , Dec 30, 2011
    • 0 Attachment
      In my view, it's in step 2 of 10.2.1.2.2.1.

      When some system receive a G2 document and resolve a QCode from this document using this processing model (which I assume is designed to do that), reserved characters will get percent encoded (e.g., an "&" will gets encoded to "%26"). This prevents resolving to URIs with "&" inside (this is why the URI I gave as an example can't be represented by a QCode).

      Note that the "Note" in step 2 can't help us here: it is useless in the context I'm describing because a receiver, even at the application level, don't know whether a reserved character is a delimiter or just some data. Only the producer can know.

      Philippe

      --- In newsml-g2@yahoogroups.com, misha.wolf@... wrote:
      >
      > True.
      >
      > I'm looking at section 10.2 of the specification. Where is the problematical wording?
      >
      > Thanks,
      > Misha
      >
      >
      > -----Original Message-----
      > From: newsml-g2@yahoogroups.com [mailto:newsml-g2@yahoogroups.com] On Behalf Of Philippe Mougin
      > Sent: 30 December 2011 17:58
      > To: newsml-g2@yahoogroups.com
      > Subject: [newsml-g2] Re: URI and qcodes - does some one has examples and documentation ?
      >
      > That's a good interoperability advice that should be followed.
      > It applies to unreserved characters, though. The problems I mention to is related to reserved ones. (Or maybe I'm missing the point ?)
      >
      > Philippe
      >
      > --- In newsml-g2@yahoogroups.com, misha.wolf@ wrote:
      > >
      > > The Wikipedia says [1]:
      > > URIs that differ only by whether an unreserved character is percent-encoded or appears literally are equivalent by definition, but URI processors, in practice, may not always recognize this equivalence. For example, URI consumers shouldn't treat "%41" differently from "A" or "%7E" differently from "~", but some do. For maximum interoperability, URI producers are discouraged from percent-encoding unreserved characters.
      > >
      > > [1] http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_unreserved_characters
      > >
      > > Misha
      > >
      > > -----Original Message-----
      > > From: newsml-g2@yahoogroups.com [mailto:newsml-g2@yahoogroups.com] On Behalf Of Philippe Mougin
      > > Sent: 30 December 2011 17:37
      > > To: newsml-g2@yahoogroups.com
      > > Subject: [newsml-g2] Re: URI and qcodes - does some one has examples and documentation ?
      > >
      > >
      > >
      > > > The intention is that all URIs can be represented using QCodes. If any
      > >
      > > > of our wording prevents this then such wording needs to be corrected.
      > >
      > >
      > >
      > > Ah interesting. Indeed, I think the current specification prevents some URI to be represented by QCode. Take for instance the example I gave (i.e., http://example.com/people?id=12345&group=223) I don't see how it could be represented by a QCode (I'd like to be proven wrong, though).
      > >
      > >
      > >
      > > The problem is that when resolving a QCode, the reserved characters in the code must be percent encoded (per the current wording), which means that you will never get back, for example, the "&" in the URI shown above (you'll get back %26 instead).
      > >
      > >
      > >
      > > The goal of the percent encoding mechanism in URIs with regard to reserved character is to distinguish the usage of a reserved character as a data octet or as a delimiter: when producing an URI, you percent encode a reserved character when it is a data octet, and you don't when it is a delimiter. Percent encoding change the way an URI will be interpreted. For example: http://example.com/people?id=12345&group=223 and http://example.com/people?id=12345%26group%3D223 won't be interpreted in the same way by applications (by design).
      > >
      > >
      > >
      > > If this is recognized as a problem by NewsML-G2 designers (i.e., if you want all URIs to be representable by QCodes), I think, at first glance, that a solution is to remove step 5 from the QCode resolution process and to specify that codes must be percent encoded when they are produced (e.g., written inside a NewsML-G2 document). Indeed, only the application that synthesize a code can know whether a given reserved character represent a data octet or a delimiter. A receiver can't.
      > >
      > >
      > >
      > > Philippe
      > >
      > >
      > >
      > > --- In newsml-g2@yahoogroups.com, misha.wolf@ wrote:
      > >
      > > >
      > >
      > > > Hi Philippe,
      > >
      > > >
      > >
      > > > The intention is that all URIs can be represented using QCodes. If any
      > >
      > > > of our wording prevents this then such wording needs to be corrected.
      > >
      > > >
      > >
      > > > I am not certain, though, that anything does need correcting. Surely,
      > >
      > > > it is safe to percent encode any chars in a URI.
      > >
      > > >
      > >
      > > > Misha
      > >
      > > >
      > >
      > > > -----Original Message-----
      > >
      > > > From: newsml-g2@yahoogroups.com [mailto:newsml-g2@yahoogroups.com] On Behalf Of Philippe Mougin
      > >
      > > > Sent: 30 December 2011 16:12
      > >
      > > > To: newsml-g2@yahoogroups.com
      > >
      > > > Subject: [newsml-g2] Re: URI and qcodes - does some one has examples and documentation ?
      > >
      > > >
      > >
      > > > Hi Misha,
      > >
      > > >
      > >
      > > > My understanding is that some URIs with reserved characters aren't representable by QCodes (e.g. http://example.com/people?id=12345&group=223). This is due to step 5 of the QCode resolution process (13.9.2 of the implementation guide), where reserved characters in the code are percent encoded.
      > >
      > > >
      > >
      > > > Philippe
      > >
      > > >
      > >
      > > > --- In newsml-g2@yahoogroups.com, misha.wolf@ wrote:
      > >
      > > > >
      > >
      > > > > Hi Philippe,
      > >
      > > > >
      > >
      > > > > You write: "the subset of URIs representable by QCodes"? Which URIs are not representable by QCodes?
      > >
      > > > >
      > >
      > > > > Misha
      > >
      > > > >
      > >
      > > > >
      > >
      > > > > -----Original Message-----
      > >
      > > > > From: newsml-g2@yahoogroups.com [mailto:newsml-g2@yahoogroups.com] On Behalf Of Philippe Mougin
      > >
      > > > > Sent: 30 December 2011 15:34
      > >
      > > > > To: newsml-g2@yahoogroups.com
      > >
      > > > > Subject: [newsml-g2] Re: URI and qcodes - does some one has examples and documentation ?
      > >
      > > > >
      > >
      > > > > Hello Michael,
      > >
      > > > >
      > >
      > > > > That is going to be very useful for us. We design our systems to be relatively open and future proof, and from our G2 items we need to reference things that have URIs but cannot always be referenced by QCodes (the two main reasons are that sometimes we don't want to restrict ourselves to the subset of URIs representable by QCodes, and that QCodes means G2controlled vocabulary, which implies a number of rules that we cannot always guarantee (or do not want to)).
      > >
      > > > >
      > >
      > > > > Since we are at it, may I ask whether the developer group did consider adding URI valued attribute not only in parallel with "qcode" attributes but also in parallel to any attribute whose value is a QCode (e.g. adding a "roleuri" attribute to the <description> element in parallel to the existing "role" attribute") ? This would allow to create documents considerably easier to process (for example, using XPath to exploit such document would become possible).
      > >
      > > > >
      > >
      > > > > Thanks,
      > >
      > > > >
      > >
      > > > > Philippe
      > >
      > > > >
      > >
      > > > > --- In newsml-g2@yahoogroups.com, "Michael Steidl \(IPTC\)" <mdirector@> wrote:
      > >
      > > > > >
      > >
      > > > > > Hello Jean, Philippe and all:
      > >
      > > > > >
      > >
      > > > > > to blow a secret from the G2 developer group: the next version of the
      > >
      > > > > > G2-Standards (to be release in about April 2012) will have a "uri" attribute
      > >
      > > > > > in parallel with qcode and literal attributes. (In other words: almost all
      > >
      > > > > > properties which have currently a qcode attribute will have a uri attribute
      > >
      > > > > > too - with only 2 exceptions.)
      > >
      > > > > > The goal is to be able to express an identifier in three alternative ways:
      > >
      > > > > > A) by a URI ...
      > >
      > > > > > A.1) ... in the QCode format in the @qcode attribute
      > >
      > > > > > A.2) ... in its native format in the @uri attribute
      > >
      > > > > > B) by a literal value in the @literal attribute.
      > >
      > > > > >
      > >
      > > > > > The G2 specifications and guidelines say that comparing the identifiers of
      > >
      > > > > > two concepts can only be done by comparing the full URIs. Therefore the
      > >
      > > > > > requirement came up that primarily for internal storage it would be fine to
      > >
      > > > > > be able to store the full URI (= the expanded QCode) in the property - and
      > >
      > > > > > to speed up comparing concept identifiers.
      > >
      > > > > >
      > >
      > > > > > Michael
      > >
      > > > > >
      > >
      > > > > > Michael Steidl
      > >
      > > > > > Managing Director of the IPTC [mdirector@]
      > >
      > > > > > International Press Telecommunications Council
      > >
      > > > > > Web: www.iptc.org - on Twitter @IPTC
      > >
      > > > > > Business office address:
      > >
      > > > > > 20 Garrick Street, London WC2E 9BT, United Kingdom
      > >
      > > > > > Registered in England, company no 101096
      > >
      > > > > >
      > >
      > > > > > > -----Original Message-----
      > >
      > > > > > > From: newsml-g2@yahoogroups.com [mailto:newsml-
      > >
      > > > > > > g2@yahoogroups.com] On Behalf Of Philippe Mougin
      > >
      > > > > > > Sent: Wednesday, December 28, 2011 10:45 AM
      > >
      > > > > > > To: newsml-g2@yahoogroups.com
      > >
      > > > > > > Subject: [newsml-g2] Re: URI and qcodes - does some one has examples and
      > >
      > > > > > > documentation ?
      > >
      > > > > > >
      > >
      > > > > > > Hello Jean,
      > >
      > > > > > >
      > >
      > > > > > > First, keep in mind that a QCode is just an URI written down using a
      > >
      > > > > > > compressed notation. It might be that the URIs you want to use to identify
      > >
      > > > > > > people and organizations can be expressed using the QCode notation.
      > >
      > > > > > > It might not be the case, though, as QCodes can only represent a narrow
      > >
      > > > > > > subset of URIs. If your URIs can't be expressed as QCodes, you have
      > >
      > > > > > several
      > >
      > > > > > > option to use them in NewsML-G2. For example, suppose you want to
      > >
      > > > > > > identify the creator of a document using the URI:
      > >
      > > > > > > http://example.com/people?id=12345. One option is to put this URI in the
      > >
      > > > > > > literal attribute. For example:
      > >
      > > > > > >
      > >
      > > > > > > <creator literal="http://example.com/people?id=12345">
      > >
      > > > > > > <name>John Smith</name>
      > >
      > > > > > > </creator>
      > >
      > > > > > >
      > >
      > > > > > > However, you might want to express, at the NewsML-G2 level, the fact that
      > >
      > > > > > > your identifier is actually an URI (this can have several benefits that I
      > >
      > > > > > won't
      > >
      > > > > > > detail here). To do that, an option is to use a remoteInfo element instead
      > >
      > > > > > of
      > >
      > > > > > > the literal attribute. For example:
      > >
      > > > > > >
      > >
      > > > > > > <catalog>
      > >
      > > > > > > <scheme alias="rel" uri="http://www.iana.org/assignments/relation/" />
      > >
      > > > > > > </catalog> ...
      > >
      > > > > > > <creator>
      > >
      > > > > > > <name>John Smith</name>
      > >
      > > > > > > <remoteInfo href="http://example.com/people?id=12345" rel="rel:self"
      > >
      > > > > > > /> </creator>
      > >
      > > > > > >
      > >
      > > > > > > Finally, note that IPTC is considering the introduction of "uri"
      > >
      > > > > > attributes along
      > >
      > > > > > > existing "qcode" attributes. If this is adopted, you might soon be able to
      > >
      > > > > > > write:
      > >
      > > > > > >
      > >
      > > > > > > <creator uri="http://example.com/people?id=12345">
      > >
      > > > > > > <name>John Smith</name>
      > >
      > > > > > > </creator>
      > >
      > > > > > >
      > >
      > > > > > > Best,
      > >
      > > > > > >
      > >
      > > > > > > Philippe Mougin
      > >
      > > > > > >
      > >
      > > > > > >
      > >
      > > > > > > --- In newsml-g2@yahoogroups.com, "jdelahousse01"
      > >
      > > > > > > <delahousse.jean@> wrote:
      > >
      > > > > > > >
      > >
      > > > > > > > Hello,
      > >
      > > > > > > >
      > >
      > > > > > > > I am working as consultant for a french media on various semantic
      > >
      > > > > > > > related subjects. We'll implement NewsML/EventsML as backbone for
      > >
      > > > > > > articles, events, bio.. back office management format (and rNews /
      > >
      > > > > > > Schema.org for publication) To identify people and organizations we wish
      > >
      > > > > > to
      > >
      > > > > > > use URI as identifier.
      > >
      > > > > > > > I a trying to find documentation and examples about using URI instead of
      > >
      > > > > > > qcodes (or with qcodes)into well formed newsml document.
      > >
      > > > > > > >
      > >
      > > > > > > > Would someone have a XML files with examples ? a link to documentation
      > >
      > > > > > > ?
      > >
      > > > > > > >
      > >
      > > > > > > > Thanks
      > >
      > > > > > > >
      > >
      > > > > > > > Jean
      > >
      > > > > > > >
      > >
      > >
      > >
      > >
      > >
      > >
      > >
      > > ------------------------------------
      > >
      > >
      > >
      > > Any member of this IPTC moderated Yahoo group must comply with the Intellectual Property Policy of the IPTC, available at http://www.iptc.org/goto/ipp. Any posting is assumed to be submitted under the conditions of this IPTC IP Policy.
      > >
      > > Yahoo! Groups Links
      > >
      > >
      > >
      > > http://groups.yahoo.com/group/newsml-g2/
      > >
      > >
      > >
      > > Individual Email | Traditional
      > >
      > >
      > >
      > > http://groups.yahoo.com/group/newsml-g2/join
      > >
      > > (Yahoo! ID required)
      > >
      > >
      > >
      > > newsml-g2-digest@yahoogroups.com
      > >
      > > newsml-g2-fullfeatured@yahoogroups.com
      > >
      > >
      > >
      > > newsml-g2-unsubscribe@yahoogroups.com
      > >
      > >
      > >
      > > http://docs.yahoo.com/info/terms/
      > >
      > >
      > >
      > >
      > > This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
      > >
      >
      >
      >
      >
      > ------------------------------------
      >
      > Any member of this IPTC moderated Yahoo group must comply with the Intellectual Property Policy of the IPTC, available at http://www.iptc.org/goto/ipp. Any posting is assumed to be submitted under the conditions of this IPTC IP Policy.
      > Yahoo! Groups Links
      >
      >
      >
      >
      > This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
      >
    • Philippe Mougin
      ... It s indeed a bit tricky! It works like this: a specific scheme does not have to explicitly list characters with reserved purpose for that scheme. By
      Message 35 of 35 , Jan 5, 2012
      • 0 Attachment
        --- In newsml-g2@yahoogroups.com, "Michael Steidl \(IPTC\)" <mdirector@...> wrote:
        >
        > This "reserved purpose" is exactly causing me headaches as I was not able to
        > find an *explicit* definition that e.g. & has a purpose which makes it a
        > reserved character for the http URI scheme.
        > I've emphasized *explicit* as the http RFC 2616 doesn't even mention &, on
        > the other hand the URI RFC 3986 includes & into its - potential - reserved
        > characters [1] but says the state of being a reserved character is actually
        > defined by the specifications of the different URI schemes. But as just
        > said: the specification of the http scheme doesn't even mention the &
        > character. So do we have to build on practical experience combined with
        > feelings from the guts or written specifications?

        It's indeed a bit tricky! It works like this: a specific scheme does not have to explicitly list characters with reserved purpose for that scheme. By default, it inherits those from the generic syntax. It can, however, "override" the role of a given character in some its component: in that case it has to state this explicitly. As you remark, RFC 2616 does not mention &, so the generic system applies and therefore & in the query of an http URI must be percent encoded by the producing application if it is not used as a separator but as regular data octet.

        What I summarize here is a consequence of various ABNF rules in the generic syntax (RFC 3986), starting with the rule for the query component, and some prose in section 2.2. In particular: "each syntax rule lists the characters allowed within that component (i.e., not delimiting it), and any of those characters that are also in the reserved set are "reserved" for use as subcomponent delimiters within the component" and "URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component."

        Philippe
      Your message has been successfully submitted and would be delivered to recipients shortly.