Loading ...
Sorry, an error occurred while loading the content.

JSON -- rfc question -- Encoding

Expand Messages
  • json_is_clever
    I was having a discussion with an implementer of a JSON library, and over time the discussion boiled down to an argument over the interpretation of one
    Message 1 of 8 , Apr 12, 2007
    • 0 Attachment
      I was having a discussion with an implementer of a JSON library, and
      over time the discussion boiled down to an argument over the
      interpretation of one sentence in the RFC. That sentence is the first
      one in section "3. Encoding", to wit "JSON text SHALL be encoded in
      Unicode."

      Hopefully this will draw a response from Douglas Crockford, since it
      may be that he is the only one that really knows what he meant by what
      he said in the RFC.... but opinions from others are welcome.

      The implementer claims that because Unicode is not a binary encoding,
      but a set of codepoints, that this sentence means that Unicode
      codepoints should be used in some manner to represent characters, but
      that the bitstream that represents the JSON text could use escapes for
      most of the characters (reducing the size of the character
      repertoire), and then be encoded in, say, EBCDIC. While it is
      possible to conceive of doing such a thing, I question if his
      interpretation is valid.

      My interpretation of that same sentence is that the character codes
      should indeed be Unicode codepoints, and in addition, they should be
      encoded in one of the 5 UTF-* Unicode encodings listed later in the
      same section. Clearly by using escapes, it is possible to reduce the
      character repertoire to ASCII, or even a subset of ASCII, but ASCII is
      a subset of Unicode's UTF-8 encoding, so that is valid.

      So I'd welcome interpretations on exactly what encodings are legal
      JSON texts.

      Some following questions are:

      1) Must a conformant JSON parser recognize all legal encodings? Or
      can a parser be conformant with documented restrictions on what
      encodings it can accept?

      2) Is a conformant JSON generator allowed to have options to generate
      text that is not quite JSON conformant? Or must it be completely
      limited to producing JSON legal JSON text only?
    • Douglas Crockford
      ... No, that is a wild misreading of the RFC. The JSON text must be represented in Unicode, and the preferred encoding is UTF-8. ... Parties can agree on what
      Message 2 of 8 , Apr 13, 2007
      • 0 Attachment
        --- In json@yahoogroups.com, "json_is_clever" <vendor@...> wrote:

        > The implementer claims that because Unicode is not a binary encoding,
        > but a set of codepoints, that this sentence means that Unicode
        > codepoints should be used in some manner to represent characters, but
        > that the bitstream that represents the JSON text could use escapes for
        > most of the characters (reducing the size of the character
        > repertoire), and then be encoded in, say, EBCDIC. While it is
        > possible to conceive of doing such a thing, I question if his
        > interpretation is valid.

        No, that is a wild misreading of the RFC. The JSON text must be
        represented in Unicode, and the preferred encoding is UTF-8.

        > 1) Must a conformant JSON parser recognize all legal encodings? Or
        > can a parser be conformant with documented restrictions on what
        > encodings it can accept?

        Parties can agree on what is acceptable and meaningful. For example,
        it is reasonable for a receiver to put limits on message length or
        string length or nesting depth.

        > 2) Is a conformant JSON generator allowed to have options to generate
        > text that is not quite JSON conformant? Or must it be completely
        > limited to producing JSON legal JSON text only?

        A JSON generator may only produce valid JSON text.
      • Michael Schwarz
        I have one more question, do I need to convert unicode characters to something like u12345 ? Michael On 13 Apr 2007 15:47:13 -0700, Douglas Crockford
        Message 3 of 8 , Apr 13, 2007
        • 0 Attachment
          I have one more question, do I need to convert unicode characters to
          something like "\u12345"?

          Michael


          On 13 Apr 2007 15:47:13 -0700, Douglas Crockford <douglas@...>
          wrote:
          >
          > --- In json@yahoogroups.com <json%40yahoogroups.com>, "json_is_clever"
          > <vendor@...> wrote:
          >
          > > The implementer claims that because Unicode is not a binary encoding,
          > > but a set of codepoints, that this sentence means that Unicode
          > > codepoints should be used in some manner to represent characters, but
          > > that the bitstream that represents the JSON text could use escapes for
          > > most of the characters (reducing the size of the character
          > > repertoire), and then be encoded in, say, EBCDIC. While it is
          > > possible to conceive of doing such a thing, I question if his
          > > interpretation is valid.
          >
          > No, that is a wild misreading of the RFC. The JSON text must be
          > represented in Unicode, and the preferred encoding is UTF-8.
          >
          > > 1) Must a conformant JSON parser recognize all legal encodings? Or
          > > can a parser be conformant with documented restrictions on what
          > > encodings it can accept?
          >
          > Parties can agree on what is acceptable and meaningful. For example,
          > it is reasonable for a receiver to put limits on message length or
          > string length or nesting depth.
          >
          > > 2) Is a conformant JSON generator allowed to have options to generate
          > > text that is not quite JSON conformant? Or must it be completely
          > > limited to producing JSON legal JSON text only?
          >
          > A JSON generator may only produce valid JSON text.
          >
          >
          >


          [Non-text portions of this message have been removed]
        • Mark Miller
          ... For example, JSON in E-0.9 accepts only Unicode characters from the basic multilingual plane , i.e., characters whose code point fits in 16 bits. As I
          Message 4 of 8 , Apr 13, 2007
          • 0 Attachment
            Douglas Crockford wrote:
            >> 1) Must a conformant JSON parser recognize all legal encodings? Or
            >> can a parser be conformant with documented restrictions on what
            >> encodings it can accept?
            >
            > Parties can agree on what is acceptable and meaningful. For example,
            > it is reasonable for a receiver to put limits on message length or
            > string length or nesting depth.

            For example, JSON in E-0.9 accepts only Unicode characters from the "basic
            multilingual plane", i.e., characters whose code point fits in 16 bits. As I
            read the JSON spec, this is an allowable restriction. This restriction is
            needed since E-0.9 does not support Unicode supplementary characters, i.e.,
            characters whose code points are >= 2**16.

            --
            Text by me above is hereby placed in the public domain

            Cheers,
            --MarkM
          • json_is_clever
            ... Thanks for the clarification, Douglas.
            Message 5 of 8 , Apr 14, 2007
            • 0 Attachment
              --- In json@yahoogroups.com, "Douglas Crockford" <douglas@...> wrote:

              > No, that is a wild misreading of the RFC. The JSON text must be
              > represented in Unicode, and the preferred encoding is UTF-8.

              Thanks for the clarification, Douglas.
            • Douglas Crockford
              ... No. The u notation is only required for some of the control characters.
              Message 6 of 8 , Apr 14, 2007
              • 0 Attachment
                --- In json@yahoogroups.com, "Michael Schwarz" <michael.schwarz@...>
                wrote:
                >
                > I have one more question, do I need to convert unicode characters to
                > something like "\u12345"?

                No. The \u notation is only required for some of the control characters.
              • Michael Schwarz
                Hi, currently I have only something like r, n or t, are those chars allowed? Which control chars are you talking about? Thanks a lot. Michael ... -- Best
                Message 7 of 8 , Apr 14, 2007
                • 0 Attachment
                  Hi,

                  currently I have only something like \r, \n or \t, are those chars
                  allowed? Which control chars are you talking about? Thanks a lot.

                  Michael



                  On 14 Apr 2007 06:45:58 -0700, Douglas Crockford <douglas@...> wrote:
                  > --- In json@yahoogroups.com, "Michael Schwarz" <michael.schwarz@...>
                  > wrote:
                  > >
                  > > I have one more question, do I need to convert unicode characters to
                  > > something like "\u12345"?
                  >
                  > No. The \u notation is only required for some of the control characters.
                  >
                  >


                  --
                  Best regards | Schöne Grüße
                  Michael

                  Microsoft MVP - Most Valuable Professional
                  Microsoft MCAD - Certified Application Developer

                  http://weblogs.asp.net/mschwarz/
                  http://www.ajaxpro.info/

                  WPF/E: http://groups.google.com/group/wpf-everywhere

                  Skype: callto:schwarz-interactive
                  MSN IM: passport@...
                • Douglas Crockford
                  ... None of the control characters can appear in JSON strings. You can use the u convention to represent them. A few of them, such as linefeed and tab, have
                  Message 8 of 8 , Apr 14, 2007
                  • 0 Attachment
                    --- In json@yahoogroups.com, "Michael Schwarz" <michael.schwarz@...>
                    wrote:

                    > currently I have only something like \r, \n or \t, are those chars
                    > allowed? Which control chars are you talking about? Thanks a lot.

                    None of the control characters can appear in JSON strings. You can use
                    the \u convention to represent them. A few of them, such as linefeed
                    and tab, have more compact representations such as \n and \t. Take
                    your pick.
                  Your message has been successfully submitted and would be delivered to recipients shortly.