Loading ...
Sorry, an error occurred while loading the content.

61Re: [json] Re: Escaping unicode characters

Expand Messages
  • Mark Miller
    Aug 11, 2005
    • 0 Attachment
      Douglas Crockford wrote:
      > I don't think that JSON needs to care. It comes down to two questions:
      > (A) How does a sender encode a supplementary character in UTF-16?

      > The answer to (A) is obvious: use the two character surrogate
      > encoding.

      The surrogate encoding uses two UTF-16 surrogate *code units* to encode a
      single character. These 16-bit surrogates are *not* characters.

      Your answer to (A), once rephrased, is indeed the correct answer to the (A)
      question. But this is only relevant to JSON if you define a JSON string as
      Java or Javascript does: as a \u encoding of a sequences of UTF-16 code units.
      If a JSON string is a \u encoding of sequence of characters, as it is in
      Python, then the UTF-16 question is not relevant. But the existing JSON spec
      provides no way to do an Ascii encoding of the supplementary characters (such
      as Python's \U encoding).

      > (B) What does a receiver that is only able to handle BMP do with
      > supplementary characters?
      > I think the answer to (B) is the same.
      > There are many languages, such as Java and JavaScript and E, that are
      > unable to strictly do the right thing, but for now, the surrogate hack
      > is the state of the art.
      > JSON's interest is to get the data from here to there without
      > distortion. JSON should be able to pass all of the Unicode characters,
      > including the extended characters. If a receiver chooses to filter
      > them out or replace them with surrogate pairs, that's its business.

      To live up to this fine goal, JSON needs to define an Ascii encoding of the
      supplementary characters. From (A), perhaps your intended answer is: Use the
      \u encoding of the UTF-16 code point encoding of the supplementary characters.
      This is Java's answer. AFAIK, it would be compatible with Javascript but not
      with Python. This would be an adequate answer -- Java and Javascript both live
      with it. I think the important thing is to make a definite choice.

      Text by me above is hereby placed in the public domain

    • Show all 9 messages in this topic