Loading ...
Sorry, an error occurred while loading the content.

Re: [json] Re: Escaping unicode characters

Expand Messages
  • Mark Miller
    ... The surrogate encoding uses two UTF-16 surrogate *code units* to encode a single character. These 16-bit surrogates are *not* characters. Your answer to
    Message 1 of 9 , Aug 11, 2005
    • 0 Attachment
      Douglas Crockford wrote:
      > I don't think that JSON needs to care. It comes down to two questions:
      >
      > (A) How does a sender encode a supplementary character in UTF-16?

      > The answer to (A) is obvious: use the two character surrogate
      > encoding.

      The surrogate encoding uses two UTF-16 surrogate *code units* to encode a
      single character. These 16-bit surrogates are *not* characters.

      Your answer to (A), once rephrased, is indeed the correct answer to the (A)
      question. But this is only relevant to JSON if you define a JSON string as
      Java or Javascript does: as a \u encoding of a sequences of UTF-16 code units.
      If a JSON string is a \u encoding of sequence of characters, as it is in
      Python, then the UTF-16 question is not relevant. But the existing JSON spec
      provides no way to do an Ascii encoding of the supplementary characters (such
      as Python's \U encoding).


      > (B) What does a receiver that is only able to handle BMP do with
      > supplementary characters?
      >
      > I think the answer to (B) is the same.
      >
      > There are many languages, such as Java and JavaScript and E, that are
      > unable to strictly do the right thing, but for now, the surrogate hack
      > is the state of the art.
      >
      > JSON's interest is to get the data from here to there without
      > distortion. JSON should be able to pass all of the Unicode characters,
      > including the extended characters. If a receiver chooses to filter
      > them out or replace them with surrogate pairs, that's its business.

      To live up to this fine goal, JSON needs to define an Ascii encoding of the
      supplementary characters. From (A), perhaps your intended answer is: Use the
      \u encoding of the UTF-16 code point encoding of the supplementary characters.
      This is Java's answer. AFAIK, it would be compatible with Javascript but not
      with Python. This would be an adequate answer -- Java and Javascript both live
      with it. I think the important thing is to make a definite choice.

      --
      Text by me above is hereby placed in the public domain

      Cheers,
      --MarkM
    • Douglas Crockford
      ... of the ... Use the ... characters. ... but not ... both live ... That is the answer.
      Message 2 of 9 , Aug 11, 2005
      • 0 Attachment
        > > JSON's interest is to get the data from here to there without
        > > distortion. JSON should be able to pass all of the Unicode characters,
        > > including the extended characters. If a receiver chooses to filter
        > > them out or replace them with surrogate pairs, that's its business.

        > To live up to this fine goal, JSON needs to define an Ascii encoding
        of the
        > supplementary characters. From (A), perhaps your intended answer is:
        Use the
        > \u encoding of the UTF-16 code point encoding of the supplementary
        characters.
        > This is Java's answer. AFAIK, it would be compatible with Javascript
        but not
        > with Python. This would be an adequate answer -- Java and Javascript
        both live
        > with it. I think the important thing is to make a definite choice.

        That is the answer.
      • jemptymethod
        ... I don t know from Python, but it seems to me JSON has drifted significantly from Javascript. These discussions are all well and good, but if it means that
        Message 3 of 9 , Aug 12, 2005
        • 0 Attachment
          --- In json@yahoogroups.com, Mark Miller <markm@c...> wrote:
          >JSON is supposed to be a subset of both Javascript and Python

          I don't know from Python, but it seems to me JSON has drifted
          significantly from Javascript. These discussions are all well and
          good, but if it means that the JSON spec is modified to the point
          where it supports constructs that cannot be interpreted by
          Javascript,
          then it will cease being a subset thereof. Rather, JSON will become
          an entity unto itself, rather than "Javascript Object Notation".
        Your message has been successfully submitted and would be delivered to recipients shortly.