Loading ...
Sorry, an error occurred while loading the content.

RE: Nonsense re URNs in NewsML spec, DTD and schema

Expand Messages
  • Misha Wolf
    Here is a proposal for correcting this mess. Replace: Note that the set of characters that can be included within a URN is limited. The allowed characters are
    Message 1 of 3 , Oct 29, 2003
    View Source
    • 0 Attachment
      Here is a proposal for correcting this mess.

      Replace:

      Note that the set of characters that can be included within a URN
      is limited. The allowed characters are specified by the Internet
      Engineering Task Force (IETF) in its Request For Comments (RFC)
      number 2141. This document is available at
      http://www.ietf.org/rfc/rfc2141.txt. Any character that is not
      within the permitted URN character set must be represented as a %
      character followed by the sequence of one to six bytes of its
      UTF-8 encoding, represented in their hexadecimal form. Thus, for
      example, the space character in a URN would appear as %20, and the
      % character itself would appear as %25. This mechanism does not
      cater for all Unicode or UTF-16 characters. Therefore, it is
      important not to include characters in a NewsItemId that cannot be
      encoded in UTF-8.

      with:

      Note that the set of characters that can be directly included
      within a URN is limited. The allowed characters are specified by
      the Internet Engineering Task Force (IETF) in its Request For
      Comments (RFC) number 2141. This document is available at
      http://www.ietf.org/rfc/rfc2141.txt. Any character that is not
      within the permitted URN character set must be converted to a
      sequence of legal characters as described in RFC 2141.

      The above injects the word "directly" into the first sentence and
      replaces the profoundly erroneous last four sentences with a single
      correct sentence.

      Misha


      -----Original Message-----
      From: Misha Wolf
      Sent: 28 October 2003 17:54
      To: NewsML (newsml@yahoogroups.com)
      Subject: RE: Nonsense re URNs in NewsML spec, DTD and schema


      I've been asked to say more about the difference between the NewsML
      Spec/DTD/Schema and reality, and to provide an example.

      There are three problems with the existing text:

      1 A minor wording problem:

      "Note that the set of characters that can be included within a
      URN is limited."

      That is not true. This is true:

      "Note that the set of characters that can be directly included
      within a URN is limited."

      2 A critical error in the description of the algorithm for
      converting a character which cannot be included directly to a
      sequence of legal characters (see the example below).

      3 A completely nonsensical statement:

      "This mechanism does not cater for all Unicode or UTF-16
      characters. Therefore, it is important not to include
      characters in a NewsItemId that cannot be encoded in UTF-8."

      That is nonsense on at least two levels:

      - The mechanism *does* cater for all Unicode characters.

      - UTF-16, like UTF-8, is an encoding form rather than a character
      set, hence a phrase such as "all Unicode or UTF-16 characters"
      cannot be parsed.

      And here is an example:

      Consider the character U+00A2 CENT SIGN. The hexadecimal UTF-8
      representation of U+00A2 is C2 A2. The existing documentation states
      that this should become %C2A2. In reality, it should become %C2%A2.

      Misha


      -----Original Message-----
      From: Misha Wolf
      Sent: 28 October 2003 17:25
      To: NewsML (newsml@yahoogroups.com)
      Subject: Nonsense re URNs in NewsML spec, DTD and schema


      I've just seen that the NewsML spec, DTD and schema state:

      Note that the set of characters that can be included within a URN
      is limited. The allowed characters are specified by the Internet
      Engineering Task Force (IETF) in its Request For Comments (RFC)
      number 2141. This document is available at
      http://www.ietf.org/rfc/rfc2141.txt. Any character that is not
      within the permitted URN character set must be represented as a %
      character followed by the sequence of one to six bytes of its
      UTF-8 encoding, represented in their hexadecimal form. Thus, for
      example, the space character in a URN would appear as %20, and the
      % character itself would appear as %25. This mechanism does not
      cater for all Unicode or UTF-16 characters. Therefore, it is
      important not to include characters in a NewsItemId that cannot be
      encoded in UTF-8.

      That is nonsense. This is the truth:

      Note that the set of characters that can be directly included
      within a URN is limited. The allowed characters are specified by
      the Internet Engineering Task Force (IETF) in its Request For
      Comments (RFC) number 2141. This document is available at
      http://www.ietf.org/rfc/rfc2141.txt. Any character that is not
      within the permitted URN character set must be converted to a
      sequence of legal characters as follows:
      1. The character is encoded using the UTF-8 encoding.
      2. Each of the resulting (one to six) bytes is expressed as a
      pair of hexadecimal digits, in the ASCII encoding.
      3. Each such pair of hexadecimal digits is prefixed with the
      character "%".
      Thus, for example, the space character in a URN would appear as
      %20, and the "%" character itself would appear as %25.


      --
      Misha Wolf
      Standards Manager
      Content Architecture Group
      Reuters, 85 Fleet Street, London EC4P 4AJ

      Telephone +44 20 7542 6722
      Mobile +44 7990 56 6722
      Email misha.wolf@...
      Reuters Messaging misha.wolf.reuters.com@...




      --------------------------------------------------------------- -
      Visit our Internet site at http://www.reuters.com

      Get closer to the financial markets with Reuters Messaging - for more
      information and to register, visit http://www.reuters.com/messaging

      Any views expressed in this message are those of the individual
      sender, except where the sender specifically states them to be
      the views of Reuters Ltd.
    Your message has been successfully submitted and would be delivered to recipients shortly.