61Re: [json] Re: Escaping unicode characters
- Aug 11, 2005Douglas Crockford wrote:
> I don't think that JSON needs to care. It comes down to two questions:The surrogate encoding uses two UTF-16 surrogate *code units* to encode a
> (A) How does a sender encode a supplementary character in UTF-16?
> The answer to (A) is obvious: use the two character surrogate
single character. These 16-bit surrogates are *not* characters.
Your answer to (A), once rephrased, is indeed the correct answer to the (A)
question. But this is only relevant to JSON if you define a JSON string as
If a JSON string is a \u encoding of sequence of characters, as it is in
Python, then the UTF-16 question is not relevant. But the existing JSON spec
provides no way to do an Ascii encoding of the supplementary characters (such
as Python's \U encoding).
> (B) What does a receiver that is only able to handle BMP do withTo live up to this fine goal, JSON needs to define an Ascii encoding of the
> supplementary characters?
> I think the answer to (B) is the same.
> unable to strictly do the right thing, but for now, the surrogate hack
> is the state of the art.
> JSON's interest is to get the data from here to there without
> distortion. JSON should be able to pass all of the Unicode characters,
> including the extended characters. If a receiver chooses to filter
> them out or replace them with surrogate pairs, that's its business.
supplementary characters. From (A), perhaps your intended answer is: Use the
\u encoding of the UTF-16 code point encoding of the supplementary characters.
with it. I think the important thing is to make a definite choice.
Text by me above is hereby placed in the public domain
- << Previous post in topic Next post in topic >>