Loading ...
Sorry, an error occurred while loading the content.

1943Re: [json] JSON strings cannot point to post-BMP Unicode codepoints?

Expand Messages
  • John Cowan
    Apr 7 11:21 AM
      Shriramana Sharma scripsit:

      > However, if that restriction of *four* hex digits is meant to be
      > enforced, then it means that post-BMP codepoints (such as 0x11005
      > BRAHMI LETTER A) cannot be represented in such strings directly, but
      > that they have to be manually (i.e. by the program outputting JSON-ed
      > data) decomposed into their equivalent UTF16 surrogate pairs (for
      > instance, 0xd804 0xdc05).

      It can be represented either as the actual character, 4 bytes in any
      of UTF-8, UTF-16, or UTF-32; or else as two consecutive ASCII escapes:
      "\uD804\uDC05".

      > IMHO this is an unnecessary restriction.

      JSON is backward compatible by design with ECMAScript 3, which does not
      support the \U escape.

      > Does it mean that even though there is no \U notation, I can directly
      > input post-BMP codepoints as part of the string literals?

      Correct.

      > In this case even the \u notation is only there as a just-in-case?

      Just so.

      --
      There are three kinds of people in the world: John Cowan
      those who can count, cowan@...
      and those who can't.
    • Show all 8 messages in this topic